business intelligence toolsjnujprdistance.com/assets/lms/lms jnu/mba/mba - it management… · 3.8...

Business Intelligence Tools

Board of Studies

Prof. H. N. Verma Prof. M. K. GhadoliyaVice- Chancellor Director, Jaipur National University, Jaipur School of Distance Education and Learning Jaipur National University, JaipurDr. Rajendra Takale Prof. and Head AcademicsSBPIM, Pune

___________________________________________________________________________________________

Subject Expert Panel

Dr. Ramchandra G. Pawar Ashwini PanditDirector, SIBACA, Lonavala Subject Matter ExpertPune

___________________________________________________________________________________________

Content Review Panel

Gaurav Modi Shubhada PawarSubject Matter Expert Subject Matter Expert

___________________________________________________________________________________________Copyright ©

This book contains the course content for Business Intelligence Tools.

First Edition 2013

Printed byUniversal Training Solutions Private Limited

Address05th Floor, I-Space, Bavdhan, Pune 411021.

All rights reserved. This book or any portion thereof may not, in any form or by any means including electronic or mechanical or photocopying or recording, be reproduced or distributed or transmitted or stored in a retrieval system or be broadcasted or transmitted.

___________________________________________________________________________________________

Index

ContentI. ...................................................................... II

List of FiguresII. ..........................................................VI

List of TablesIII. ......................................................... VII

AbbreviationsIV. ......................................................VIII

Case StudyV. .............................................................. 122

BibliographyVI. ......................................................... 126

Self Assessment AnswersVII. ................................... 129

Book at a Glance

I

Contents

Chapter I ....................................................................................................................................................... 1Business Intelligence .................................................................................................................................... 1Aim ............................................................................................................................................... 1Objectives ............................................................................................................................................... 1Learning outcome .......................................................................................................................................... 11.1 The Birth of BI ......................................................................................................................................... 21.2 What is Business Intelligence? ................................................................................................................ 21.3 History of BI ............................................................................................................................................ 21.4 Customer Relationship Management ....................................................................................................... 31.5 What is a Data Warehouse? ...................................................................................................................... 4 1.5.1 The Invention of the Data Warehouse ...................................................................................... 5 1.5.2 Extraction, Transformation and Loading ................................................................................. 51.6 What are Queries and Reports? ................................................................................................................ 51.7 What is OLAP? ........................................................................................................................................ 6 1.7.1 An OLAP Example .................................................................................................................. 61.8 FASMI ............................................................................................................................................... 81.9 OLAP Applications .................................................................................................................................. 91.10 What is Data Mining? ............................................................................................................................ 9 1.10.1 The Data Mining Process ..................................................................................................... 10 1.10.2 Data Mining Techniques .......................................................................................................11 1.10.3 Web Mining: The Internet-variant of Mining ...................................................................... 121.11 Business Intelligence vs. Decision Support Systems ........................................................................... 131.12 Current Status ....................................................................................................................................... 131.13 Application Areas ................................................................................................................................. 141.14 Competitive Intelligence ...................................................................................................................... 15Summary ..................................................................................................................................................... 16References ................................................................................................................................................... 17Recommended Reading ............................................................................................................................. 17Self Assessment ........................................................................................................................................... 18

Chapter II ................................................................................................................................................... 20Components of Business Intelligence Tools ............................................................................................. 20Aim .............................................................................................................................................................. 20Objectives .................................................................................................................................................... 20Learning outcome ........................................................................................................................................ 202.1 Introduction ............................................................................................................................................ 212.2 Business Driving Forces ........................................................................................................................ 212.3 How to Identify BI Candidates? ............................................................................................................ 222.3.1 Senior Executives of a Corporation .................................................................................................... 222.3.2 IT Vice Presidents, Directors, and Managers ...................................................................................... 222.3.3 CFOs, Financial Vice Presidents, and Controllers .............................................................................. 232.3.4 Sales VPs, Product Managers, and Customer Service Directors ........................................................ 232.3.5 Operations and Production Management ............................................................................................ 242.4 Main BI Terms ....................................................................................................................................... 25 2.4.1 Operational Databases ........................................................................................................... 25 2.4.2 OLTP ...................................................................................................................................... 25 2.4.3 Data Warehouse ..................................................................................................................... 25 2.4.4 Data Mart ............................................................................................................................... 26 2.4.5 External Data Source ............................................................................................................. 26 2.4.6 OLAP ..................................................................................................................................... 26 2.4.7 OLAP Server .......................................................................................................................... 27 2.4.8 Metadata ................................................................................................................................. 27

II

2.4.9 Drill-Down ............................................................................................................................. 27 2.4.10 Operational Versus Informational Databases ....................................................................... 282.5 Different BI Implementations ................................................................................................................ 29 2.5.1 Summary Table ...................................................................................................................... 30 2.5.2 OLTP Data at Separate Server ............................................................................................... 31 2.5.3 Single Data Mart .................................................................................................................... 322.6 Data Warehouse Components ................................................................................................................ 342.7 Data Sources .......................................................................................................................................... 352.7.1 Extraction/Propagation ....................................................................................................................... 362.7.2 Transformation/Cleansing ................................................................................................................... 362.7.3 Data Refining ...................................................................................................................................... 362.7.4 Physical Database Model .................................................................................................................... 372.7.5 Logical Database Model ..................................................................................................................... 382.7.6 Metadata Information .......................................................................................................................... 392.7.7 Operational Data Source (ODS) ......................................................................................................... 402.7.8 Data Mart ............................................................................................................................................ 412.7.9 Presentation and Analysis Tools ......................................................................................................... 41Summary ..................................................................................................................................................... 43References ................................................................................................................................................... 44Recommended Reading ............................................................................................................................. 44Self Assessment ........................................................................................................................................... 45

Chapter III .................................................................................................................................................. 47Open Source Tools for Business Intelligence ........................................................................................... 47Aim .............................................................................................................................................................. 47Objectives .................................................................................................................................................... 47Learning outcome ........................................................................................................................................ 473.1 Introduction ............................................................................................................................................ 483.2 Criteria for all Tool Categories .............................................................................................................. 483.3 Criteria for Extract‐Transform‐Load Tools ............................................................................................ 483.4 Criteria for Database Management Systems .......................................................................................... 483.5 Criteria for On‐Line Analytical Processing Servers .............................................................................. 493.6 Criteria for On‐Line Analytical Processing Clients ............................................................................... 49 3.6.1 Extract-Transform‐Load Tools............................................................................................... 49 3.6.2 Apatar ..................................................................................................................................... 49 3.6.3 Clover.ETL ............................................................................................................................. 49 3.6.4 ETL Integrator ....................................................................................................................... 50 3.6.5 KETL ..................................................................................................................................... 50 3.6.6 Kettle / Pentaho Data Integration ........................................................................................... 50 3.6.7 Octopus .................................................................................................................................. 51 3.6.8 Palo ETL Server ..................................................................................................................... 51 3.6.9 Pequel ..................................................................................................................................... 51 3.6.10 Scriptella .............................................................................................................................. 52 3.6.11 Talend Open Studio / JasperETL ......................................................................................... 523.7 Database Management Systems ............................................................................................................. 52 3.7.1 Firebird ................................................................................................................................... 53 3.7.2 Ingres Database ...................................................................................................................... 53 3.7.3 LucidDB ................................................................................................................................. 53 3.7.4 MonetDB ............................................................................................................................... 54 3.7.5 MySQL .................................................................................................................................. 54 3.7.6 PostgreSQL ............................................................................................................................ 55 3.7.7 On‐Line Analytical Processing Servers ................................................................................. 55 3.7.8 Mondrian / Pentaho Analysis Services .................................................................................. 55 3.7.9 Palo ........................................................................................................................................ 56

III

3.8 On‐Line Analytical Processing Clients .................................................................................................. 563.8.1 FreeAnalysis ....................................................................................................................................... 563.8.2 JPalo Client and JPalo Web Client ...................................................................................................... 563.8.3 JMagallanes Olap & Reports .............................................................................................................. 57 3.8.4 JPivot ..................................................................................................................................... 57 3.8.5 JRubik .................................................................................................................................... 57 3.8.6 OpenI ..................................................................................................................................... 57 3.8.7 Rex ......................................................................................................................................... 58 3.8.8 Integrated Business Intelligence Suites ................................................................................. 58 3.8.9 Pentaho Open BI Suite ........................................................................................................... 58 3.8.10 SpagoBI ............................................................................................................................... 58 3.9 Conclusion ................................................................................................................................ 583.10 Adopting proprietary/standards-based Business Intelligence .............................................................. 593.11 Using Open Source Business Intelligence Tools ................................................................................. 59 3.11.1 Downsides of Adopting Proprietary/Standards-Based Business Intelligence (BI) .............. 59 3.11.2 Advantages of Using Open Source BI tools ........................................................................ 60 3.11.3 Adoption of Open Source BI tools ....................................................................................... 60Summary ..................................................................................................................................................... 61References ................................................................................................................................................... 62Recommended Reading ............................................................................................................................. 62Self Assessment ........................................................................................................................................... 63

Chapter IV ............................................................................................................................................. 65Business Analytics ...................................................................................................................................... 65Aim ............................................................................................................................................. 65Objectives ............................................................................................................................................. 65Learning outcome ........................................................................................................................................ 654.1 Introduction to Business Analytics ........................................................................................................ 664.2 Where Should we Leverage Business Analytics? .................................................................................. 664.3 What’s the payoff? ................................................................................................................................. 674.4 Business Analytics and Customer Relationships ................................................................................... 67 4.4.1 Valuation ................................................................................................................................ 68 4.4.2 Customisation ........................................................................................................................ 68 4.4.3 Pricing .................................................................................................................................... 68 4.4.4 Retention ................................................................................................................................ 68 4.4.5 Fraud Detection ...................................................................................................................... 684.5 What Information and Technology do we Need? .................................................................................. 694.6 What Kinds of people do we need? ....................................................................................................... 694.7 What Roles Must Senior Executives Play? ............................................................................................ 694.8 More Applications of Analytics Key Process: Supply Chain ................................................................ 704.9 Key Asset: People .................................................................................................................................. 704.10 Watershed Event: Merger or Acquisition ............................................................................................. 704.11 Applications, Trends, and Strategies .................................................................................................... 704.12 Business Intelligence: Another Step in the ERP Evolution ................................................................. 714.13 Business Intelligence Benefits and Challenges .................................................................................... 734.14 Some Aspects about Business Intelligence Implementation ................................................................ 734.15 Managing the Implementation of Business Intelligence Systems ....................................................... 744.16 Background of Implementation of Business Intelligence Systems ..................................................... 754.17 Introduction and Research Motivation ................................................................................................. 754.18 Research Objective .............................................................................................................................. 764.19 Research Methodology ........................................................................................................................ 764.20 CSFS Finding and Discussion ............................................................................................................. 794.21 Development of a Critical Success Factors Framework ...................................................................... 814.22 Committed Management Support and Sponsorship ............................................................................ 82

IV

4.23 Business User-Oriented Change Management .................................................................................... 824.25 Clear Business Vision and Well-Established Case ............................................................................... 824.25 Business-Driven Methodology and Project Management ................................................................... 834.26 Business-Centric Championship and Balanced Project Team Composition ....................................... 834.27 Strategic and Extensible Technical Framework ................................................................................... 844.28 Sustainable Data Quality and Governance Framework ....................................................................... 844.29 Concluding Remarks and Future Research .......................................................................................... 85Summary ..................................................................................................................................................... 87References ................................................................................................................................................... 88Recommended Reading ............................................................................................................................. 88Self Assessment ........................................................................................................................................... 89

Chapter V .................................................................................................................................................... 91Data Warehousing and Data Mart ........................................................................................................... 91Aim .............................................................................................................................................................. 91Objectives .................................................................................................................................................... 91Learning outcome ........................................................................................................................................ 915.1 Introduction ............................................................................................................................................ 925.2 The Corporate Information Factory ....................................................................................................... 925.3 Getting Data In ....................................................................................................................................... 94 5.3.1 Operational Systems .............................................................................................................. 94 5.3.2 Integration and Transformation ............................................................................................. 95 5.3.3 Data Warehouse ..................................................................................................................... 96 5.3.4 Operational Data Store ........................................................................................................... 97 5.3.5 Data Management .................................................................................................................. 975.4 Getting Information Out ........................................................................................................................ 975.5 Data Delivery ......................................................................................................................................... 98 5.5.1 Data Mart ............................................................................................................................... 98 5.5.2 Decision Support Interface (DSI) .......................................................................................... 99 5.5.3 Transaction Interface (TrI) ..................................................................................................... 995.5.4 Meta Data ............................................................................................................................................ 99 5.5.5 What is a Data Mart? ........................................................................................................... 1005.6 How is Data Mart Different from a Data Warehouse? ......................................................................... 1005.7 Dependent and Independent Data Marts .............................................................................................. 1015.8 What are the Steps in Implementing a Data Mart? .............................................................................. 101 5.8.1 Designing ............................................................................................................................. 101 5.8.2 Constructing ......................................................................................................................... 102 5.8.3 Populating ............................................................................................................................ 102 5.8.4 Accessing ............................................................................................................................. 102 5.8.5 Managing ............................................................................................................................. 1025.9 Patterns of Data Mart Development .................................................................................................... 1025.10 Development Models without Explicit User Feedback ..................................................................... 104 5.10.1 The Top Down Model ........................................................................................................ 104 5.10.2 The Bottom up Model ........................................................................................................ 105 5.10.3 Parallel Development ......................................................................................................... 1065.11 Development Models with Feedback ................................................................................................. 108 5.11.1 Top Down with Feedback .................................................................................................. 108 5.11.2 The Bottom-Up Model with Feedback .............................................................................. 109 5.11.3 The Parallel Model with Feedback .....................................................................................1115.12 The Dynamics of Data Mart Development .........................................................................................112Summary ....................................................................................................................................................113References .................................................................................................................................................114Recommended Reading ............................................................................................................................114Self Assessment ..........................................................................................................................................115

V

Chapter VI .................................................................................................................................................117An Introduction to OLAP ........................................................................................................................117Aim .............................................................................................................................................................117Objectives ...................................................................................................................................................117Learning outcome .......................................................................................................................................1176.1 Introduction ...........................................................................................................................................1186.2 Why do we need OLAP? .....................................................................................................................118 6.2.1 Increasing Data Storage .......................................................................................................118 6.2.2 Data versus Information ........................................................................................................118 6.2.3 DataLayout ...........................................................................................................................1196.3 OLAP Fundamentals .............................................................................................................................1196.4 What is a Cube? .................................................................................................................................. 1206.5 Multidimensionality ............................................................................................................................ 121 6.5.1 Four Dimensions and Beyond ............................................................................................. 123 6.5.2 “Slicing & Dicing” .............................................................................................................. 125 6.5.3 Nested Dimensions ............................................................................................................. 1256.6 Hierarchies & Groupings .................................................................................................................... 125 6.6.1 “Drill-down”, “Drill-up” & “Drill-across” .......................................................................... 1266.7 Consolidated Data ............................................................................................................................... 126 6.7.1 Pre-consolidated versus On-Demand ................................................................................... 127 6.7.2 Sparse Data ......................................................................................................................... 1286.8 Storing the Data ................................................................................................................................... 129 6.8.1 ROLAP ............................................................................................................................... 129 6.8.2 MOLAP ................................................................................................................................ 129 6.8.3 HOLAP ............................................................................................................................... 1296.9 OLAP as a Component of Business Intelligence ................................................................................. 1296.10 Enterprise Performance Management ................................................................................................ 1306.11 Data Warehousing .............................................................................................................................. 1306.12 Business Reporting ............................................................................................................................ 1306.13 Predictive Analytics and Data Mining ............................................................................................... 1306.14 OLAP ................................................................................................................................................. 130 6.14.1 Why OLAP?....................................................................................................................... 1316.15 Business-Focused Multidimensional Data ......................................................................................... 1316.16 Business-Focused Calculations .......................................................................................................... 1326.17 Trustworthy Data and Calculations ...................................................................................... 1336.18 Speed-of-Thought Analysis ............................................................................................................... 1336.19 Flexible, Self-Service Reporting ........................................................................................................ 1346.20 OLAP System Components ............................................................................................................... 134 6.20.1 Server ................................................................................................................................. 134 6.20.2 Multidimensional storage .................................................................................................. 134 6.20.3 Calculation engine ............................................................................................................. 134 6.20.4 Front-end analysis and reporting tools ............................................................................... 1346.21 OLAP Types ....................................................................................................................................... 134 6.21.1 Multidimensional OLAP .................................................................................................... 135 6.21.2 Relational OLAP ................................................................................................................ 135 6.21.3 Hybrid OLAP ..................................................................................................................... 1366.22 OLAP Products .................................................................................................................................. 137 6.22.1 OLAP with a Data Warehouse ........................................................................................... 1376.23 Typical OLAP Applications ............................................................................................................... 1376.24 Similarities between Essbase and Oracle OLAP ............................................................................... 1386.25 Differences Between Essbase and Oracle OLAP ............................................................................... 1386.26 Essbase: Separate-Server OLAP ........................................................................................................ 1386.27 Oracle OLAP: Database-Centric OLAP ............................................................................................ 138Summary ................................................................................................................................................... 139

VI

References ................................................................................................................................................ 139Recommended Reading ........................................................................................................................... 140Self Assessment ......................................................................................................................................... 141

Chapter VII .............................................................................................................................................. 143Decision Support Systems ....................................................................................................................... 143Aim ............................................................................................................................................................ 143Learning outcome ...................................................................................................................................... 1437.1 Introduction .......................................................................................................................................... 1447.2 Evolution of Decision Support Systems .............................................................................................. 1457.3 Definition of Decision Support Systems .............................................................................................. 1467.4 Architecture of Decision Support Systems ......................................................................................... 1467.5 Decision Support System Sub-Specialities .......................................................................................... 1487.6 Data/Model Management ..................................................................................................................... 1497.7 User Interface Sub-Systems ................................................................................................................. 1507.8 Knowledge-Based Decision Support Systems ..................................................................................... 1507.9 Group DSS/Group Support Systems/Electronic Meeting Systems ..................................................... 1507.10 Organisational Decision Support Systems ......................................................................................... 1517.11 Decision Support System Design ....................................................................................................... 1517.12 Decision Support System Implementation ......................................................................................... 1517.13 Types of Decisions ............................................................................................................................. 1527.14 Human Judgment and Decision Making ............................................................................................ 1527.15 Modelling Decisions .......................................................................................................................... 1537.16 Decision Support Systems ................................................................................................................. 153 7.16.1 Database Management System (DBMS) ........................................................................... 153 7.16.2 Model-Base Management System (MBMS) ...................................................................... 153 7.16.3 Dialog Generation and Management System (DGMS) ..................................................... 1547.17 Normative and Descriptive Approaches ............................................................................................ 1547.18 Decision-Analytic Decision Support Systems ................................................................................... 1557.19 Systems with Static Domain Models .................................................................................... 1567.19.1 Systems with Customised Decision Models ................................................................................... 1567.20 Systems Capable of Learning a Model from Data ............................................................................. 1577.21 Equation-Based and Mixed Systems ................................................................................................. 1577.22 User Interfaces to Decision Support Systems .................................................................................... 157 7.22.1 Support for Model Construction and Model Analysis ....................................................... 158 7.22.2 Support for Reasoning about the Problem Structure in Addition to Numerical Calculations 158 7.22.3 Support for Both Choice and Optimisation of Decision Variables .................................... 1587.23 Graphical Interface ............................................................................................................................. 158Summary ................................................................................................................................................... 159References ................................................................................................................................................ 160Recommended Reading ........................................................................................................................... 160Self Assessment ......................................................................................................................................... 161

Chapter VIII ............................................................................................................................................. 163Types of Business Intelligence Tools ....................................................................................................... 163Aim ............................................................................................................................................................ 163Objectives .................................................................................................................................................. 163Learning outcome ...................................................................................................................................... 1638.1 Introduction .......................................................................................................................................... 1648.2 The Key General Categories of Business Intelligence Tools ............................................................... 166 8.2.1 Spreadsheets ......................................................................................................................... 167 8.2.2 Reporting and Querying Software ....................................................................................... 168 8.2.3 OLAP ................................................................................................................................... 168 8.2.4 Increasing Data Storage ....................................................................................................... 169

VII

8.2.5 Dashboard ............................................................................................................................ 169 8.2.6 Data Mining ......................................................................................................................... 170 8.2.7 The Foundations of Data Mining ......................................................................................... 170 8.2.8 Data Warehousing ................................................................................................................ 171 8.2.9 Decision Engineering ........................................................................................................... 171 8.2.10 Process Mining .................................................................................................................. 172 8.2.11 Business Performance Management .................................................................................. 173 8.2.12 The BPM Imperative.......................................................................................................... 174 8.2.13 Local Information System ................................................................................................. 1748.3 Eight Strategies for Delivering Business Intelligence on the Web ...................................................... 174 8.3.1 Pick the best delivery vehicle for your audience and your data .......................................... 175 8.3.2 Integrate The Presentation Layer ......................................................................................... 175 8.3.3 Integrate the Security Layer ................................................................................................. 175 8.3.4 Customise the Presentation for Target Devices and User Roles .......................................... 175 8.3.5 Target Reports to Users ........................................................................................................ 176 8.3.5 Use a Combined Push/Pull Model ....................................................................................... 176 8.5.6 Keep Information Timely ..................................................................................................... 176 8.5.7 Take Advantage of Enterprise Application Integration (EAI) ............................................. 1768.6 Use Analytics for Strategic Business Information Implementation ..................................................... 1768.7 Life in the Fast Lane ............................................................................................................................ 1778.8 Modelling for Company Wide Coordination ....................................................................................... 1778.9 Talk to the Boss .................................................................................................................................... 1788.10 Making the Operational Case for Data Warehousing ........................................................................ 1798.11 Return on Investment? ....................................................................................................................... 1798.12 Many‐for‐One, Among Others ............................................................................................. 1808.13 Making Our Case ............................................................................................................................... 1808.14 Who Gets the Bill? ............................................................................................................................. 180Summary ................................................................................................................................................... 181References ................................................................................................................................................. 182Recommended Reading ........................................................................................................................... 182Self Assessment ......................................................................................................................................... 183

VIII

List of Figures

Fig. 1.1 The pyramid of BI ............................................................................................................................ 2Fig. 1.3 A 3-dimensional OLAP cube ............................................................................................................ 7Fig. 1.4 The OLAP cube looked at from 3 different dimensions ................................................................... 7Fig. 1.5 The 3 dimensions combined in the OLAP cube ............................................................................... 7Fig. 1.6 The data mining process ................................................................................................................. 10Fig. 1.7 Knowledge value versus user expertise .......................................................................................... 14Fig. 2.1 External data sources ...................................................................................................................... 26Fig. 2.2 Drill-down ...................................................................................................................................... 28Fig. 2.3 Operational versus informational databases ................................................................................... 29Fig. 2.4 Business Intelligence implementations .......................................................................................... 30Fig. 2.5 Summary tables on OLTP ............................................................................................................... 31Fig. 2.6 Poor man’s data warehouse ............................................................................................................ 32Fig. 2.7 2-tiered data mart ............................................................................................................................ 33Fig. 2.8 3-tiered data mart ............................................................................................................................ 34Fig. 2.9 Data warehouse components .......................................................................................................... 35Fig. 2.10 Data refining ................................................................................................................................. 37Fig. 2.11 Physical database models ............................................................................................................. 38Fig. 2.12 Logical data model ....................................................................................................................... 38Fig. 2.13 Metadata ....................................................................................................................................... 39Fig. 2.14 (ODS) Operational data store ....................................................................................................... 40Fig. 2.15 Data mart ...................................................................................................................................... 41Fig. 2.16 Presentation and analysis tools ..................................................................................................... 42Fig. 4.1 Business intelligence tools .............................................................................................................. 71Fig. 4.2 Business intelligence tools and technologies .................................................................................. 72Fig. 4.3 A critical success factors framework for the implementation of business intelligence systems .... 81Fig. 5.1 The corporate information factory architecture .............................................................................. 93Fig. 5.2 Getting data in vs. getting information out ..................................................................................... 94Fig. 5.3 Integration and transformation process .......................................................................................... 95Fig. 5.4 Data management ........................................................................................................................... 99Fig. 5.5 Data delivery ................................................................................................................................. 100Fig.5.6 Top-Down flow from data warehouses to data marts .................................................................... 104Fig. 5.7 The bottom up flow from data marts to the data warehouse......................................................... 105Fig. 5.8 Data mart creation guided by a data model of the data warehouse .............................................. 106Fig. 5.9 The top down model with the end user feedback ......................................................................... 109Fig. 5.10 The Bottom-up Flow from data marts to the data warehouse with feedback ..............................110Fig. 5.11 Data mart creation guided by a data model of the data warehouse with feedback and an Eventual data warehouse .............................................................................................................111Fig. 6.1 Two-dimensional cube .................................................................................................................. 120Fig. 6.2 The two-dimensional cube reoriented .......................................................................................... 121Fig. 6.3. The three-dimensional cube ......................................................................................................... 123Fig. 6.4 Two-dimensional view of a four-dimensional structure ............................................................... 124Fig. 6.5 Measures dimension” nested” inside the Store dimension ........................................................... 125Fig. 6.6 The bulbs hierarchy within the products dimension ..................................................................... 126Fig. 6.7 A two-dimensional cube with consolidated data .......................................................................... 127Fig. 6.8 A sparse two-dimensional cube .................................................................................................... 128Fig. 6.9 Sample dimensions with members ............................................................................................... 132Fig. 6.10 MOLAP advantages and challenges ........................................................................................... 135Fig. 6.11 ROLAP advantages and challenges ............................................................................................ 136Fig. 6.12 HOLAP advantages and challenges ............................................................................................ 136Fig. 7.1 Components of decision support system ...................................................................................... 147Fig. 7.2 Theory applications and contributing disciplines of decision support system ............................. 149

IX

Fig. 7.3 The architecture of a DSSs ........................................................................................................... 154Fig. 7.4 Example of a Bayesian network modelling teaching expenditures in university operation ......... 156Fig. 8.1 Reasons for increased profitability ............................................................................................... 164Fig. 8.2 The strategy map for increased profitability ................................................................................. 165Fig. 8.3 Improved strategies for increased sales ........................................................................................ 166Fig. 8.5 Decision engineering framework .................................................................................................. 172

X

List of Tables

Table 1.2 OLAP application areas ................................................................................................................. 9Table 1.3 The steps of the data mining process ............................................................................................11Table 1.4 BI vs. DSS definition ................................................................................................................... 13Table 4.1. Delphi participants and their BI systems experience in EAMOs ................................................ 79Table 4.2 Ratings of critical success factors by Delphi participants ............................................................ 80Table 6.1 A relational table containing sales records. ................................................................................ 120

XI

Abbreviations

AI - Actionable IntelligenceANN - Artificial Neural NetworksANSI - American National Standards InstituteAPI - Application Program InterfaceBAM - Business Activity MonitoringBI - Business IntelligenceBN - Bayesian NetworksBPI - Business Process ImprovementBPM - Business Performance ManagementBPM - Business Process ManagementCBIS - Computer-Based Information SystemsCEP - Complex Event ProcessingCI - Competitive IntelligenceCIF - Corporate Information FactoryCIO - Chief Information OfficerCIS - Computational Intelligence SocietyCPI - Continuous Process ImprovementCPL - Common Public LicenseCPM - Corporate Performance ManagementCRM - Customer Relationship ManagementCSCW - Computer-Supported Cooperative WorkCSFs - Critical Success FactorsCSS - Collaboration Support SystemsCSV - Comma Separated ValuesCVS - Concurrent Versions SystemDBI - Database InterfaceDBMS - Database Management SystemsDGMS - Dialog Generation and Management SystemDM - Data MiningDMTC - Data Mining Technical CommitteeDOS - Disk Operating SystemDSI - Decision Support InterfaceDSS - Decision Support SystemDW - Data warehouseEAI - Enterprise Application IntegrationETI - Extract‐Transform‐LoadEAMOs - Engineering Asset Management OrganizationsEDP - Electronic Data ProcessingEIS - Executive Information SystemEMS - Electronic Meeting SystemsEPM - Enterprise Performance ManagementERP - Enterprise Resource PlanningESP - Environment for Strategic PlanningESS - Expert Support Systems

XII

ETL - Extraction, Transformation and LoadingFAQs - Frequently Asked QuestionsFASMI - Fast Analysis of Shared Multidimensional InformationFTP - File Transfer ProtocolGB - GigabyteGDSS - Group Decision Support SystemsGPL - General Public LicenseGSS - Group Support SystemsGUI - Graphical User InterfaceHOLAP - Hybrid On-Line Transaction ProcessingHTML - Hypertext Markup LanguageICT - Information and Communication TechnologyIEEE - Institute of Electrical and Electronic EngineersIFPS - Interactive Financial Planning SystemsIMS - Information Management SystemIRCs - Internet Relay ChatsISS - Intelligent Support SystemsIT - Information TechnologyJAD - Joint Application DevelopmentJDBC - Java Database ConnectivityJSP - Java Server PagesKBDSS - Knowledge-Based Decision Support SystemsKPIs - Key Performance IndicatorsKPIs - Key Performance IndicatorsLDAP - Lightweight Directory Access ProtocolLGPL - Library General Public LicenseMBMS - Model-Based Management SystemsMCDM - Multiple Criteria Decision MakingMDX - Multidimensional ExpressionsMES - Manufacturing Execution SystemsMIS - Management Information SystemMOLAP - Multidimensional On-Line Transaction ProcessingMRP - Manufacturing Resource PlanningMS - Management ScienceMSO - Microsoft OfficeNRT - Near Real‐TimeODBC - Open Database ConnectivityODS - Operational Data StoreOLAP - On-Line Analytical ProcessingOLTP - On-Line Transaction ProcessingOR - Operations ResearchPDF - Portable Document FormatQ&R - Querying and ReportingRDBMS - Relational Database Management SystemROI - Return on InvestmentROLAP - Relational On-Line Transaction ProcessingSEM - Structural Equation Model

XIII

SFA - Sales Force AutomationSOX - Sarbanes-Oxley ActSQL - Structured Query LanguageTPS - Transaction Processing SystemsTQM - Total Quality ManagementXML - Extensible Markup LanguageXOLAP - Extended On-Line Transaction Processing


XIV

Chapter I

Business Intelligence

Aim

The aim of this chapter is to:

introduce the history of business intelligence•

elucidate levels of analytical applications and corresponding tools•

explain application areas of business intelligence•

Objectives

The objectives of this chapter are to:

explain customer relationship management•

explicate fast analysis of shared multidimensional information•

elucidate multiple dimensions of OLAP•

Learning outcome

At the end of this chapter, you will be able to:

understand data warehouse and its invention•

explain application areas of business intelligence•

explicate data mining process•

1

1.1 The Birth of BIIn search for the year in which Business Intelligence was first introduced, Naeem Hashmi (2000) says that BI is a term introduced by Howard Dresner of Gartner Group in 1989 and whilst Hans Dekker (2002) claims that Howard Dresner invented the term in 1992! It is clear they speak of the same term BI and the same Howard Dresner, but the supposed “birth-years” of BI are somewhat puzzling! Who is right?

As we know, most companies include a “Contact Us” page in their website. Fortunately, Gartner Group is one of them. The term “Business Intelligence” was created in 1989 and coined by Gartner in that year. Howard Dresner had a hand in the creation of that term, but did not join Gartner until YE 1992, when he drove it into the mainstream.

1.2 What is Business Intelligence?Having got the indistinctness of BI’s “birth-year” out of the way, we can proceed with questioning: What is Business Intelligence?

Many authors speak of BI as being an “umbrella term”, with various components “hanging under” this umbrella. Another way to look at it is the first explanation of Business Intelligence, which is the following pyramid:

Data mining

OLAP,

Queries & reports

Data WarehouseFreq

uenc

y an

d #

of u

sers

,

Com

plex

ity &

Bus

ines

s Pot

entia

l.

Fig. 1.1 The pyramid of BI(Source: www.few.vu.nl/en/Images/werkstuk-quarles_tcm39-91416.doc)

What this simple picture tells us is that BI consists of various levels of analytical applications and corresponding tools that are carried out on top of a Data Warehouse. The lower we go in this hierarchy, the more frequently the tool is used and the more users it will have. Also, the more the extracted information is based on facts in figures. The higher we go in the hierarchy, the more complex the analyses taking place and the more business potential that lies in the resulting information and knowledge.

In researching what is written about the elements hanging under the umbrella, or contained in the pyramid of Business Intelligence, a conclusion can be drawn that the above ordering is one widely adopted. That is why we are following this ordering for our subject layout.

1.3 History of BIUp to this point, we have agreed on Business Intelligence as being an umbrella that covers a whole range of concepts. It is clear that BI has somehow evolved from other concepts. Therefore, when exploring the history of Business Intelligence, it seems wise to take a look at what preceded Business Intelligence.


2

The problem with topics such as Business Intelligence, Decision Support Systems and many other acronyms with the ‘S’ standing for ‘System” is that they are all part of a terribly volatile field. Much has been written about Information and Support Systems, authors have filled tomes with describing the existing Systems: how do they work, how should they be built, what are the requirements, and so forth. Unfortunately, little to nothing is written on the history and development of the Systems. What we would have to do is take all these writings lay them out next to each other and compare. Consider the following overview given in figure below.

Financial reporting system

Demographic data provider

Marketing Database

Executive Information System(EIS)

World Wide Web

Enterprise Information Portals Web Analytics

Data mining

Interaction Personalization

Online analytical process(OLAP)

Closed Loop CRM

Analytic Application

Data Warehouse

Ad Hoc Query Tools

Customer Resource

Management

Decision Support System (DSS)

Personal Computing

Relational Databases

Spreadsheet Software

Multi dimensional database

Math/ Merge Services

Transaction Systems

Extract Files

Reporting Systems

Customer Information File

(CIFs)

1975 1980 1985 1990 1995 1998

Fig. 1.2 Trends and influences in data warehousing, 1975-2000(Source: www.few.vu.nl/en/Images/werkstuk-quarles_tcm39-91416.doc)

The information that is most volatile is that what we read on the Internet. Where up to about ten years ago authors wrote their findings down in books and journals, nowadays the easier, faster, cheaper and more accessible way of publishing is on the World Wide Web. The problem with this medium however, is that a web page has to be maintained and updated regularly to keep it and its topics alive. When this does not happen, pages get lost or wiped away or simply contain information that is out of date.

The Database Magazine (also known as DB/M) proved a valuable source of information. DB/M is pinpointed as being a magazine we must not miss if we are interested in BI. Because it has been published since 1990, however, it is only since 1997 that BI received the attention of the authors of DB/M.

1.4 Customer Relationship ManagementWhere companies used to be focused on delivering the right products to their customers, they are now focused on delivering their products to the right customers. The same goes for Business Intelligence applications. They used to be more of a ‘back-office’ tool, concentrated on reporting to the higher management of an organisation. But with the shift from product to customer, we welcome Customer Relationship Management (commonly abbreviated as CRM).

3

Within this framework of CRM, BI is no longer only used by management levels, but BI-tools and techniques are developed for all organisational levels.At various points in the report we will see how Business Intelligence can influence CRM. To give a brief example up front, BI can be used to identify what is called ‘customer profitability’: which customer profiles are responsible for the highest profit? Based on the answer to this question, a company can choose to change their strategy and, for instance, make special offers to certain customer groups.

1.5 What is a Data Warehouse?According to Simon & Schaffer (2001), there is no official definition of a data warehouse, that is, a standard definition supported by a standards committee such as the American National Standards Institute (ANSI).

Lewis (2001) writes that the most authoritative names in data warehousing define data warehouse as:A collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of data is specific to some moment of time. The data warehouse contains atomic data and lightly summarised data.

Basically, a Data Warehouse consists of one or more copies of transaction and/or non-transaction data. These data have been transformed in such a way that they are contained in a structure that is suitable for querying, reporting and other data analysis. One of the key features of a Data Warehouse is that it deals with very large volumes of data, in the range of terabytes. But this is not all. There are cases in which a Data Warehouse has to serve 100’s to 1000’s of users, process millions of daily records and carry out 1000’s to 100.000’s of daily queries and reports! Data rich industries have been the most typical users, consumer goods, retail, financial services and transport for the obvious reason that they have large quantities of good quality internal and external data available.

In the opinion of most authors and companies, a data warehouse forms a base, on top of which tools like querying and reporting can be used to analyse business results. In particular, multi-dimensional warehouses allow more advanced techniques like OLAP and data mining to identify trends and make predictions. One definition that does not match this description is the following by Laudon & Laudon (2000):

“A data warehouse is a database, with reporting and query tools that stores current and historical data extracted from various operational systems and consolidated for management reporting and analysis”.

The definition these authors give is incomplete. They fail to include the aspect of multi-dimensionality and with this the fields of OLAP and data mining. Also they include reporting and query tools in the concept of data warehouse, instead of placing them on top of the data warehouse. Finally, they ascribe the use of data warehousing to the management level, whereas most providers focus on business users at all levels when developing this kind of tools.

Of course a Data Warehouse does not come into existence out of nothing. A short description of this is given in the third section: Extraction, Information and Loading. In the last section of this chapter we will go into more detail on the two authoritative names in data warehousing, Inmon and Kimball. When dealing with Data Warehousing, one could also come across the term “Data Mart”. Basically a Data Mart is a part of a Data Warehouse, specifically concentrated on a part of the business, like a single department. For instance, all the data needed by the Sales Department are copied out of the Data Warehouse into a Data Mart that will suit just the Sales Department.

Summarising, the Data Warehouse has the following features:It forms the basis for analytical applications.•

It experiences enterprise wide usage.•

It is a replication of the data existing in the operational databases.•

The business data are cleaned, re-arranged, aggregated and combined.•

The warehouse is regularly updated with new data.•

It contains the “single truth” of the business.•


4

1.5.1 The Invention of the Data WarehouseAccording to Brant (1999) many claims are going round on who actually invented the data warehouse. The right answer to this question, he says, is IBM. To find the roots of the data warehouse we need to go back to 1988, when Barry Devlin and Paul Murphy published their article “Architecture for a business and information system”. This article led to IBM developing an “information-warehouse”-strategy. These roots were quickly buried underneath the data warehouse rage that was created by Bill Inmon’s book Building the Data Warehouse.

1.5.2 Extraction, Transformation and LoadingA data warehouse is the beginning of the business analysis. The most important process in creating a data warehouse is ETL, which stands for Extraction, Transformation and Loading. In the first step, Extraction, data from one or more data sources (databases or file systems) is extracted and copied into what is called the warehouse. Like in the example of UnoVu, this data source is often a Transaction Processing System. After the extraction, the data has to undergo the Transformation step. These transformations can range from simple data conversions, summarising and unifying data codes to complex data scrubbing techniques.

Especially when the data comes from many different sources, it has to be brought together so that all of the information from each source is brought into the transformation model cleanly. This is a crucial step in the chain from data sources to data warehouse, since it is here that the data quality is taken care of. After the transformation, the “cleansed” data is finally moved from flat files into a data warehouse. This last step is called Loading.

1.6 What are Queries and Reports?The definitions of querying and reporting are given by Alter (1999):Query (language): Special-purpose computer language used to provide immediate, online answers to user questions.

Report (generator): Program that makes it comparatively easy for users or programmers to generate reports by describing specific report components and features.

Comparatively little is written on querying and reporting (hereafter called Q&R). That is, compared to techniques like OLAP and data mining. This is probably due to the fact that queries and reports are the most basic forms of analysis on a data warehouse. They already existed back in the 1970’s, in the form of hardcopy reports. As Lewis (2001) puts it, interactivity was limited to the visual and perhaps extended to writing notes or highlighting on the reports. Today users have available highly-interactive, online, analytic processing and visualisation tools, where selected data can be formatted, graphed, drilled, sliced, diced, mined, annotated, enhanced, exported and distributed. Queries and reports fulfil the purpose of telling management and users “what has happened”, for example how high the sales were in the past month or how are the sales of this month compared to those of last month.

Nearly everywhere querying and reporting are lumped together in one tool. This is quite understandable. There are two types of reporting. The first is the standard reporting. Examples of these are point-in-time reports on sales figures or other key business that appear each day, week, month, etc. The second type of reporting is when a report is the output of an ad hoc query. Using a query tool, a user can ask questions about patterns or details in the data. Logically, the answer will be in some form of a report. Even though this type of reporting can also be standardised when necessary, the unique thing about queries is that they are built so that the user can ask extra questions about information that doesn’t appear directly from the data. If we take this querying to a higher-dimensional level and shorter response times, we arrive at OLAP-tools.

The results in the reports form an important input element for the Customer Relationship Management. For instance, reports on sales and marketing analyses may result in readjusting the marketing strategies or promotions. Financial reports may indicate that the company is running risks in certain product areas. Analysing customer profitability can lead to changes in the way certain customers are approached when buying their products. And there are many more examples where these came from.

5

1.7 What is OLAP?A useful definition of On-Line Analytical Processing is the following:On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the users.

OLAP is a technology that allows users to carry out complex data analyses with the help of a quick and interactive access to different viewpoints of the information in data warehouses. These different viewpoints are an important characteristic of OLAP, also called multidimensionality. Multidimensional means viewing the data in three or more dimensions. For a database of a Sales Department, these dimensions could be Product, Time, Store and Customer Age. Analysing data in multiple dimensions is particularly helpful in discovering relationships that can not be directly deduced from the data itself.

Managers must be able to analyze data across any dimension, at any level of aggregation, with equal functionality and ease. OLAP software should support these views of data in a natural and responsive fashion, insulating users of the information from complex query syntax. The fact is that the multidimensionality of OLAP reflects the multidimensionality of an organisation. The average business model cannot be represented in a two-dimensional spreadsheet, but needs many more dimensions. Equally, managers and analysts want to be able to look at data from these different dimensions. That is why all these dimensions should be contained in the OLAP database.

Next to this aspect of multidimensionality, Forsman reviews two other key features of OLAP: “calculation-intensive capabilities” and “time intelligence”. The first refers to the ability to perform complex calculations, in order to create information from very large and complex amounts of data. The second feature is the dimension “time”. Time is an integral component of almost any analytical application. In an OLAP system comparisons of different time periods must be easily defined, as well as the concept of balances over time totals, averages etcetera.

Turban & Aronson (2001, p.147) employ a much broader definition of OLAP:The term online analytical processing (OLAP) refers to a variety of activities usually performed by end users in online systems. There is no agreement on what activities are considered OLAP. Usually one includes activities such as generating queries, requesting ad hoc reports, conducting statistical analyses, and building DSS and multimedia applications. Some include executive information systems and data mining. To facilitate OLAP it is useful to work with the data warehouse and with a set of OLAP tools. These tools can be query tools, spreadsheets, data mining tools, data visualisation tools, and the like.

Not all organisations have the same idea about what products/tools/techniques are contained within the concept of OLAP. The only feature that all agree upon is that of Multidimensionality. For the rest, the borderlines between Q&R, OLAP and Data Mining (DM) are very vague. Some say OLAP is DM, some include OLAP in DM, and some include DM in OLAP. Turban and Aronson describe BI as ‘the new role of EIS’, so a replacement. Well, in the definition here above they tell us that ‘some include EIS and DM in OLAP’. But weren’t DM and OLAP part of BI?

A brief look at how BI-related organisations categorise their BI-products reveals that most of them offer products in the line of OLAP. OLAP is the component that is used most generally to describe the activities and services of an organisation. As mentioned before, different BI-tools are then contained in this OLAP-element.

1.7.1 An OLAP ExampleConsider a simple example to understand OLAP. Consider a shoe retailer with many shops in different cities and many different styles of shoes, for example ski boot, gumboot, and sneaker. Each shop delivers data daily on quantities sold in numbers per style. These data are stored centrally. Now the business analyst wants to follow sales by month, outlet and style. These are called dimensions, for example month dimension. If we want to look at the data of these three dimensions and say something significant about them, what we are actually doing is looking at the data stored in a 3-dimensional cube:


6

Outlet

style

month

Fig. 1.3 A 3-dimensional OLAP cube(Source: www.few.vu.nl/en/Images/werkstuk-quarles_tcm39-91416.doc)

The following three cubes show us how we can look at, respectively: data on all shoe styles sold in all months in the outlet Amsterdam, data on shoe style sneaker sold in all months in all outlets, and data on all shoe styles sold in all outlets in the month April.

Outlet

style

month

Amsterdam

Outlet

style

month

Snea

ker

Outlet

style

month

April

Fig. 1.4 The OLAP cube looked at from 3 different dimensions(Source: www.few.vu.nl/en/Images/werkstuk-quarles_tcm39-91416.doc)

When we combine these three dimensions, we get data on the number of sneakers sold in the outlet Amsterdam in the month April:

Outlet

style

month

250

Number ofsneakers sold in Amsterdam in April

Fig. 1.5 The 3 dimensions combined in the OLAP cube(Source: www.few.vu.nl/en/Images/werkstuk-quarles_tcm39-91416.doc)

7

Suppose we want information about the colours of the sneakers or the sizes sold, we would have to define new dimensions. This would mean a 4-, 5- or even more-dimensional cube. Of course cubes like this are no longer ‘visible’ to the eye, but in an OLAP-application they are possible.

1.8 FASMIIf we go back in time a few decades we come across Dr. E.F. Codd, a well-known database researcher during the 60’s, 70’s and 80’s. In 1993, Dr. Codd wrote a report titled: “Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate”, in which he defined OLAP in 12 rules. These rules make up the requirements that an OLAP application should satisfy. A year later, Nigel Pendse and his co-author Richard Creeth became increasingly occupied by the phenomenon OLAP. After a critical study of the rules of Dr. Codd, some were discarded and others lumped together in one feature, and a new definition of OLAP was born:

Fast Analysis of Shared Multidimensional Information (FASMI)In a later article they go on to describe what they mean exactly with the five separate words that make up this definition:

“Fast” means that the system is targeted to deliver most responses to users within about five seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds.

“Analysis” means that the system can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user.

“Shared” means that the system implements all the security requirements for confidentiality (possibly down to cell level) and if multiple writes access is needed, concurrent update locking at an appropriate level.

“Multidimensional” means that the system must provide a multidimensional conceptual view of the data, including full support for hierarchies and multiple hierarchies, as this is certainly the most logical way to analyze businesses and organisations.

“Information” is all of the data and derived information needed, wherever it is and however much is relevant for the application.

Nigel Pendse declares that this definition was first used by him and his company in early 1995, and that it has not needed revision in the years since. He states that the definition has now been widely adopted and is cited in over 120 Web sites in about 30 countries. Research with the help of Google revealed there to be 34 countries with one or more Web site(s) containing the term “FASMI”. A total of 21 countries host one or more Web site(s) that write about FASMI in combination with The OLAP Report. The term is widely and globally used. Striking is, next to mostly English-language sites, the large number of German (university) sites that include the terms.

We can conclude some points from the history of OLAP:

Multidimensionality is here to stay. Even hard to use, expensive, slow and elitist multidimensional products • survive in limited niches; when these restrictions are removed, it booms. We are about to see the biggest-ever growth of multidimensional applications.

End-users will not give up their general-purpose spreadsheets. Even when accessing multidimensional databases, • spreadsheets are the most popular client platform. Multidimensional spreadsheets are not successful unless they can provide full upwards compatibility with traditional spreadsheets, something that Improve and Compete failed to do.

Most people find it easy to use multidimensional applications, but building and maintaining them takes a • particular aptitude which has stopped them from becoming mass market products. But, using a combination of simplicity, pricing and bundling, Microsoft now seems determined to prove that it can make OLAP servers almost as widely used as relational databases.


8

Multidimensional applications are often quite large and are usually suitable for workgroups, rather than • individuals. Although there is a role for pure single-user multidimensional products, the most successful installations are multi-user, client/server applications, with the bulk of the data downloaded from feeder systems once rather than many times. There usually needs to be some IT supports for this, even if the application is driven by end-users.

Simple, cheap OLAP products are much more successful than powerful, complex, expensive products. Buyers • generally opt for the lowest cost, simplest product that will meet most of their needs; if necessary, they often compromise their requirements. Projects using complex products also have a higher failure rate, probably because there is more opportunity for things to go wrong.

1.9 OLAP ApplicationsOLAP technology can be used in a wide range of business applications and industries. The OLAP Report lists the following application areas:

Application Area Description

Marketing and sales analysis Mostly found in consumer goods industries, retailers and the financial services industry.

Database marketing Determine who are the best customers for targeted promotions for particular products or services.

Financial reporting To address this specific market, certain vendors have developed specialist products.

Management reporting Using OLAP based systems one is able to report faster and more flexible, with better analysis than the alternative solutions.

Profitability analysisImportant in setting prices and discounts, deciding on promotional activities, selecting areas for investment or divestment and anticipating competitive pressures.

Quality analysis OLAP tools provide an excellent way of measuring quality over long periods of time and of spotting disturbing trends before they become too serious.

Table 1.2 OLAP application areas

1.10 What is Data Mining?Data mining is the use of data analysis tools to try to find the patterns in large transaction databases. The extended versions are like the following:

Data mining is analysis of large pools of data to find patterns and rules that can be used to guide decision making and predict future behaviour.

The first type of definition talks about finding patterns in large databases; the second type also include why we want to find these patterns, namely to help decision making and predict the future. Based on this, in my opinion there are four key elements that make up a good definition of data mining:

Finding patterns•

Large amounts of data•

Help decision making•

Predict the future•

9

The idea of Data Mining (DM) is to discover patterns in large amounts of data. Whereas query and even OLAP functions require human interaction to follow relationships through a data source, data mining programs are able to derive many of these relationships automatically by analysing and “learning” from the data values contained in files and databases (Lewis, 2001). The patterns that are found in the data could provide information that cannot directly be deduced from the data itself, patterns and connections that are not straightforward. These ‘invisible’ patterns might not always be logical and useful.

For instance, for a supermarket chain that is based in several different countries, DM might show that the sales of yogurt in America might be strongly correlated with the sales of bicycles in the UK. Naturally this is a coincidental connection. But if DM reveals that customers who buy Product X most of the time also purchase Products Y and Z, it is a very valuable tool for the management to help them in their strategic decision making. Products X, Y and Z could be in shelves that are located close to each other, or the management could chose to make special offers for these three products at the same time, to increase the sales in a short time.

Actually there is nothing new about looking for patterns in data. People have been seeking patterns in data ever since human life began. Hunters seek patterns in animal migration behaviour, farmers seek patterns in crop growth, and politicians seek patterns in voter opinion. A scientist’s job is to make sense of data, to discover the patterns that govern how the physical world works and encapsulate them in theories that can be used for predicting what will happen in new situations. The entrepreneur’s job is to identify opportunities, that is, patterns in behaviour that can be turned into a profitable business, and exploit them.

1.10.1 The Data Mining ProcessA quite general view of the Data Mining process is the one offered by Van der Putten (1999):

Business

UnderstandingData

Understanding

Data

Preparation

Modeling

Evaluation

Deployment

Fig. 1.6 The data mining process(Source: www.few.vu.nl/en/Images/werkstuk-quarles_tcm39-91416.doc)

This model is also often referred to as CRISP, the CRoss Industry Standard Process. It is easy to read a book about DM-techniques and modelling and to understand DM and know what has to be done to ‘mine’ our data. It is also a big mistake. Very significant is the larger picture, or process, within which the data mining takes place. The whole business around it, the type of data, the preparations of the data and a thorough evaluation have to be taken into account. Each step of the process consists of a number of activities:


10

Step in the Process Description

Business Understanding Determining the business objectives, situation assessment, determining the goal of the data mining, producing a project plan.

Data Understanding Collecting the initial data, describing and exploring these data and verifying its quality.

Data Preparation Selecting, cleaning, constructing, integrating and formatting the data.

Modelling Selecting a modeling technique, generating test design, building and implementing the model.

Evaluation Evaluating the results, reviewing the process and determining the next steps.

Deployment Plan deployment, plan monitoring and maintenance, producing the final report and reviewing the project.

Table 1.3 The steps of the data mining process

1.10.2 Data Mining TechniquesThere are many techniques for carrying out Data Mining. A book by Witten & Frank (2000) presents a clear separation between the desired output information and the tools used to acquire this desired information. The type of output information is the way in which the newly gained knowledge is represented. For instance: classification of data, clustering of data, association rules, decision trees or tables or trees for numeric predictions. All these types of output can be the result of one or more of a wide range of techniques these are also called algorithms. Examples of algorithms are: inferring rules, statistical modelling, constructing decision trees, and constructing rules with covering algorithms, linear modelling. Other DM-tools are case based reasoning, neural computing, genetic algorithms and support vector machines.

All these techniques and concepts can also be found in the categories Machine Learning and Artificial Intelligence. In fact, they are all about Artificial Intelligence, because the information that is (artificially) gained provides the user with some form of intelligence. And the techniques used mostly involve a machine that ‘learns’ from the input examples it gets and is afterwards able to predict what will happen when other examples occur.

There is no real indication as to which techniques should be used in which cases. However, a choice of technique can be based on one or more of the following criteria:

Solution quality•

Speed•

Solution comprehensibility•

Expertise required•

In some cases it could be preferred to have a DM-tool that provides answers very quickly, no matter what the quality of the solution is. In other cases one might want a solution of very high quality, but if this means that the solution concerned becomes totally incomprehensible one will have no use for it.

11

1.10.3 Web Mining: The Internet-variant of MiningAn area of growing importance for companies trying to sell their products is e-commerce. To give an indication of the growth of this area: in a Data Mining book written in 2000, the first before last sub-section is dedicated to mining the Web and the authors place it in an infancy stadium. Here we are, two years later, and a large part of the conversation on Data Mining is dedicated to Web Mining.

The idea behind Web Mining is that the information and knowledge that is “dug up” by data mining in every-day databases can also be used to provide information about a web site and its visitors. Web sites, and especially commercial ones, generate gigabytes of data a day that describe every action made by every visitor to the site. One should realise that there is much more information hidden in the pages of a web site than one would think there is. And it is exactly this ‘invisible’ and ‘not-straightforward’ information that is most valuable to have when engaged in e-commerce activities. Typical questions answered by Web Mining are:

On which page of the web site do visitors enter / leave the site?•

How much time do visitors spend on which page of the site?•

How many visitors fill their shopping cart but leave the site without making a purchase?•

An article by Carine Joosse (2000) gives a short but interesting description of the different ways of applying data mining to the Internet. The first is ‘Mining the Web’ itself. An example of this is collecting data from various sites and categorising, analysing and presenting them on new web pages for the benefit of the web visitor. Another example is a search engine on the Web by searching for hits of a word, phrase or synonym, registration of these hits, grouping them into categories and keeping up a history, the search engine could be made more powerful. The data mining element in this is making predictions, trend analysis, categorising and data reduction.

A second type of Web mining is ‘Web usage mining’. The goal of web usage mining is analysing the site navigation: how do visitors “click” through the site, how much time do they spend on which part (page) of the site, on which point do they enter or leave the site? This form of analysis is also referred to as Clickstream Analysis. Just as important is to keep records of which visitors finally make a purchase, which visitors start making a purchase that is start filling their virtual shopping cart and do not buy in the end, and which visitors leave the site without making a purchase. By combining all these data with the registered customer profiles it is possible to define those types of customers that are most likely to purchase using the internet. Also, these customer profiles in connection with their behaviour on the Web site can be used to see if the site should be designed differently.

While most authors ascribe the Web Mining tool Clickstream Analysis to the Data Mining field, Nigel Pendse says in his “OLAP Report” that it is one of the latest OLAP applications (Pendse, 2001). He also adds Database Marketing to his list of OLAP applications. In his opinion, determining who the preferred customers are ‘can be done with brute force data mining techniques (which are slow and can be hard to interpret), or by experienced business users investigating hunches using OLAP cubes (which is quicker and easier)’. In other words, here we encounter once again the vague boundaries that exist between the concepts within Business Intelligence!

Web mining applications of a more advanced level are personalisation and multichannel-analysis. Personalisation happens when rules are activated in order to offer personalised content to the visitor. A danger in this application is that the information is not always fully reliable, in the sense that the visitor cannot be categorised correctly. When individual visitors make use of a large company network, for example, they will not be recognised as separate visitors. What Multichannel-analysis comes down to is anticipating the behaviour, wishes and possibilities of the customer in the use of different communication channels.


12

1.11 Business Intelligence vs. Decision Support SystemsMaybe the reader has noted that the term ‘decision support’ is used quite often throughout this report. Clearly, this is because the bottom line of Business Intelligence is supporting decision making. There are three types of Decision Support: model-driven, data-driven and user-driven. It is the thing to wonder whether Business Intelligence is actually the “new term” for Decision Support Systems. And more specifically: is BI “replacing” Data-driven Decision Support?

Business Intelligence Decision Support SystemBusiness Intelligence (BI) is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions.

A decision support system (DSS) is a computer program application that analyzes business data and presents it so that users can make business decisions more easily.

Table 1.4 BI vs. DSS definition

The key similarity in these two definitions is “making business decisions”, and in particular both concepts are focused on helping to make these decisions in a better and easier way. The other important similarity is they both involve decision making “based on data”.

The way Dekker (2002) looks at it is that Data Warehousing and Data Mining have two precursors: DSS and EIS. DSS is focused on the lower and middle management and makes it possible to look at and analyze data in different ways. EIS is the precursor focused on the higher management. Given the fact that Data Warehousing and Data Mining form a large part of Business Intelligence, we could indeed see DSS as the precursor of BI.

The following (Alter, 1999) fully enforces Eiben’s theory about BI replacing data-driven decision support: A number of approaches developed for supporting decision making include online analytical processing (OLAP) and data mining. The idea of OLAP grew out of difficulties analyzing the data in databases that were being updated continually by online transaction processing systems.

When the analytical processes accessed large slices of the transaction database, they slowed down transaction processing critical to customer relationships. The solution was periodic downloads of data from the active transaction processing database into a separate database designed specifically to support analysis work. This separate database often resides on a different computer, which together with its specialised software is called a data warehouse.

What Alter points out here is that, because of the difficulties when analysing the data to support decision making, the data are duplicated in a Data Warehouse on top of which OLAP and Data Mining can be applied without disturbing transaction processing. In other words, the components that make up Business Intelligence are replacing the old-fashioned way of performing data-driven decision support on the original transaction processing systems.

1.12 Current StatusAs we discussed earlier on, according to Howard Dresner (Buytendijk, no.8 1997) Business Intelligence is an umbrella-concept with a large number of techniques hanging underneath it. Several segments can be distinguished. On the underside of the market these are query & reporting tools and the so-called OLAP-viewers. On the upper side these are DSS- and EIS-packages. Business Intelligence is the covering concept of providing management information. If we add Dekker’s view from the previous section to this and replace the DSS and EIS with Data Warehousing and Data Mining (not respectively though), we have all the components of the pyramid above. Roughly speaking, that is:

13

Turban & Aronson (2001) write that the term Business Intelligence (BI) or Enterprise Systems is used to describe the new role of the Executive Information System, especially now that data warehouses can provide data in easy-to-use, graphics-intensive query systems capable of slicing and dicing data (Q&R) and providing active multi-dimensional analysis (OLAP).Simon & Shaffer (2001) find the following classification of business intelligence applications to be use-ful:

Simple reporting and querying•

Online analytical processing (OLAP)•

Executive information systems (EISs)•

Data mining•

Why do they include EISs as an application amongst Q&R, OLAP and DM? Don’t EISs already have some form of Q&R and even OLAP-like activities in them? One thing is certain, Turban & Aronson and Simon & Shaffer will not agree on a definition of BI. The first duo says that BI replaces EIS, and the second includes EIS in BI.

1.13 Application AreasThe figure below is given by Pieter den Hamer (1998). It shows what applications are used by people with a certain level of Expertise (from Low to High) and concerning a certain “knowledge value” (Data, Information or Knowledge).

Fig. 1.7 Knowledge value versus user expertise(Source: www.few.vu.nl/en/Images/werkstuk-quarles_tcm39-91416.doc)

What this figure makes clear is that end-users with different levels of expertise can apply Business Intelligence applications to different levels of knowledge. When we think of different types of users, we can picture a junior sales assistant or an accountant or an employee from the marketing department, but also their managers or the director’s personal secretary or the director himself. Some of these might not want to use BI, but the idea is that all types of end-users can use BI-tools. They will all use BI in a different way. After all, not everyone is equally computer-literate, not everyone has the same user expertise.


14

With BI-tools it is possible to carry out analyses and reports on virtually all thinkable aspects of the underlying business, as long as the data about this business come in large amounts and are stored in a Data Warehouse. Departments that are known to benefit most from Business Intelligence are (Database) Marketing, Sales, Finance, ICT (especially the Web) and the higher Management.

Recall in the chapter about Queries & Reports the remark about Q&R not being as far away from our daily line of work as it may seem. A very good example for this was the SPSS Report Writer that is tightly integrated with SPSS. Another BI-tool integrated with an application many of us use daily is Business Intelligence for Excel offered by Business Intelligence Technologies, Inc. This tool also called BIXL differs from other BI-tools in this respect: the product delivers to an end-user’s Excel spreadsheet data that can be used for analytical and reporting purposes, from Microsoft’s Analysis Services (and other OLEDB for OLAP cube providers), and adds all-important write-back capabilities for planning (and budgeting and forecasting) tasks.

1.14 Competitive IntelligenceIn the context of Business Intelligence Voorma has given an article. In this article Voorma writes that Competitive Intelligence (hereafter to be called CI) is meant to transform information into action-focused knowledge with news value for the strategy of a company. CI can be used in many situations. Amongst others:

Learning from the mistakes of competitors•

Anticipating new legislation•

Anticipating trends in environment and market•

Identifying partners and takeover candidates•

The output of a CI-process is Actionable Intelligence, by Voorma abbreviated as AI, but please be sure not to get mixed up with the widely accepted abbreviation of Artificial Intelligence! Actionable Intelligence is the action-focused (actionable) knowledge, the intelligence that stimulates changes in an organisation’s strategy.

For the rest it is not entirely clear from Voorma’s article what the added value is of CI. He names a few initial steps, like:

Identifying clear goals•

Choosing between strategic or operational control•

Specifying information requirements•

Collecting data•

15

SummaryBI is a term introduced by Howard Dresner of Gartner Group in 1989 and whilst Hans Dekker (2002).•

Many authors speak of BI as being an “umbrella term”, with various components “hanging under” this • umbrella.

BI consists of various levels of analytical applications and corresponding tools that are carried out on top of a • Data Warehouse.

Data warehouse is a collection of integrated, subject-oriented databases designed to support the DSS function, • where each unit of data is specific to some moment of time.

The data warehouse contains atomic data and lightly summarised data.•

Data Mart is a part of a Data Warehouse, specifically concentrated on a part of the business, like a single • department.

A data warehouse is the beginning of the business analysis.•

OLAP is a technology that allows users to carry out complex data analyses with the help of a quick and interactive • access to different viewpoints of the information in data warehouses.

Multidimensional means viewing the data in three or more dimensions.•

Multidimensional applications are often quite large and are usually suitable for workgroups, rather than • individuals.

OLAP products are much more successful than powerful, complex, expensive products.•

Data mining is the use of data analysis tools to try to find the patterns in large transaction databases.•

Data mining is analysis of large pools of data to find patterns and rules that can be used to guide decision making • and predict future behaviour.

The idea of Data Mining (DM) is to discover patterns in large amounts of data.•

An area of growing importance for companies trying to sell their products is e-commerce.•

The goal of web usage mining is analyzing the site navigation.•

With BI-tools it is possible to carry out analyses and reports on virtually all thinkable aspects of the underlying • business.

The output of a CI-process is Actionable Intelligence.•

Actionable Intelligence is the action-focused (actionable) knowledge, the intelligence that stimulates changes • in an organisation’s strategy.


16

ReferencesBusiness Intelligence – The Beginning• , [Online] Available at: <http://www.few.vn.nl> [Accessed 25 April 2012].

Pechenizkiy, M., 2006. • Lecture 2 Introduction to Business Intelligence, [Online] Available at: <http://www.win.tue.nl/~mpechen/courses/TIES443/handouts/lecture02.pdf> [Accessed 27 April 2012].

Hartenauer, J., 2007. • Introduction to Business Intelligence, VDM Verlag Publication.

Biere, M., 2003. • Business Intelligence for the Enterprise, Prentice Hall Professional Publication.

2009. • History of Business Intelligence, [Video Online] Available at: <http://www.youtube.com/watch?v=_1y5jBESLPE> [Accessed 27 April 2012].

2010. • What is Business Intelligence?, [Video Online] Available at: <http://www.youtube.com/watch?v=0aHtHl-jcAs> [Accessed 27 April 2012].

Recommended ReadingBecerra-Fernandez, I. &Sabherwal, R., 2010. • Business Intelligence, John Wiley & Sons Publication.

Howson, C., 2007. • Successful Business Intelligence, Tata McGraw-Hill Education Publication.

Whitehorn, M., 1999. • Business Intelligence: The IBM Solution, Springer Publication.

17

Self AssessmentWhich of the following statements is false?1.

The data warehouse contains atomic data and lightly summarised data.a. Data warehouse is designed to support the DSS function.b. Data warehouse is unable to deal with very large amount of data.c. A data warehouse is a database, with reporting and query tools.d.

__________ is a replication of the data existing in the operational databases.2. Data warehousea. Data Martb. DBMSc. Databased.

Which of the following process does not include while creating a data warehouse?3. Extractiona. Manipulationb. Transformationc. Loadingd.

_________ is the special-purpose computer language used to provide immediate, online answers to user 4. questions.

Reporta. OLAPb. Extractionc. Queryd.

_______ is a technology that allows users to carry out complex data analyses with the help of a quick and 5. interactive access to different viewpoints of the information in data warehouses.

OLAPa. OLATPb. OLAP-toolsc. OLEDBd.

________program that makes it comparatively easy for users or programmers to generate reports by describing 6. specific report components and features.

Reporta. OLAPb. Extractionc. Queryd.

_________is analysis of large pools of data to find patterns and rules that can be used to guide decision making 7. and predict future behaviour.

Data extractiona. Data warehouseb. Data miningc. Data manipulationd.


18

Which of the following process does not include in data mining process?8. Evaluationa. Abstractionb. Modellingc. Deploymentd.

Which of the following system is not decision support system?9. Model-drivena. Data-drivenb. User-drivenc. System drivend.

The output of a CI-process is_____________.10. Business Intelligencea. Artificial Intelligenceb. Actionable Intelligencec. Competitive Intelligenced.

19

Chapter II

Components of Business Intelligence Tools

Aim


elucidate various business intelligence terms•

introduce the components of data warehouse•

explain business intelligence implementations•

Objectives


explain terms related to Business Intelligence•

explicate data mart and its components•

elucidate data warehouse processes •

Learning outcome


understand analysis of business intelligence candidates•

distinguish between types of data marts•

identify presentation and analysis tools•


20

2.1 IntroductionBusiness intelligence is not business as usual. It’s about making better decisions easier and making them more quickly. Businesses collect enormous amounts of data every day: information about orders, inventory, accounts payable, point-of-sale transactions, and of course, customers. Businesses also acquire data, such as demographics and mailing lists, from outside sources. Unfortunately, based on a recent survey, over 93% of corporate data is not usable in the business decision-making process today.

Consolidating and organising data for better business decisions can lead to a competitive advantage, and learning to uncover and leverage those advantages is what business intelligence is all about. The amount of business data is increasing exponentially. In fact, it doubles every two to three years. More information means more competition. In the age of the information explosion, executives, managers, professionals, and workers all need to be able to make better decisions faster.

IBM Business Intelligence solutions are not about bigger and better technology they are about delivering more sophisticated information to the business end user. BI provides an easy-to-use, shareable resource that is powerful, cost-effective and scalable to our needs. Much more than a combination of data and technology, BI helps us to create knowledge from a world of information. Get the right data, discover its power, and share the value, BI transforms information into knowledge. Business Intelligence is the application of putting the right information into the hands of the right user at the right time to support the decision-making process.

2.2 Business Driving ForcesIt can be noted that there are some business driving forces behind business intelligence, one being the need to improve ease-of-use and reduce the resources required to implement and use new information technologies. There are additional driving forces behind business intelligence, for example:

The needs to increase revenues, reduce costs, and compete more effectively. Gone are the days when end users could manage and plan business operations using monthly batch reports, and IT organisations had months to implement new applications. Today companies need to deploy informational applications rapidly, and provide business users with easy and fast access to business information that reflects the rapidly changing business environment. Business intelligence systems are focused towards end user information access and delivery, and provide packaged business solutions in addition to supporting the sophisticated information technologies required for the processing of today’s business information.

The need to manage and model the complexity of today’s business environment; corporate mergers and deregulation means that companies today are providing and supporting a wider range of products and services to a broader and more diverse audience than ever before. Understanding and managing such a complex business environment and maximising business investment are becoming increasingly more difficult. Business intelligence systems provide more than just basic query and reporting mechanisms, they also offer sophisticated information analysis and information discovery tools that are designed to handle and process the complex business information associated with today’s business environment.

The need to reduce IT costs and leverage existing corporate business information, the investment in IT systems today is usually a significant percentage of corporate expenses, and there is a need not only to reduce this overhead, but also to gain the maximum business benefits from the information managed by IT systems. New information technologies like corporate intranets, thin-client computing, and subscription-driven information delivery help reduce the cost of deploying business intelligence systems to a wider user audience, especially information consumers like executives and business managers. Business intelligence systems also broaden the scope of the information that can be processed to include not only operational and warehouse data, but also information managed by office systems and corporate Web servers.

21

2.3 How to Identify BI Candidates?The following discovery process will helps in assessing or identifying a candidate for business intelligence. The following section provides some questions that may help in the thought process these questions are categorised by level of management and areas within a business, followed by some possible answers to the questions.

2.3.1 Senior Executives of a CorporationWhen talking to a senior executive of a company, there are some questions that might help to find out if this company is a prospect for a BI project in general, and whether the senior executive will be a supportive player during the process. Some of these questions are:

How do we currently monitor the key or critical performance indicators of our business?•

How do we presently receive monthly management reports?•

How easily can we answer ad hoc questions with our current reporting systems?•

Can we quickly spot trends and exceptions in our business?•

Do we have to wait a long time (hours? days?) for answers to new questions?•

Is everyone on our management team working from the same information?•

Depending on the response of an executive, there are certain needs that, if addressed in his responses, identify the executive as a BI project prospect. The answers to the previously mentioned questions would point to the following, if he is a candidate:

Dissatisfaction is exhibited with the current reporting systems, especially in terms of flexibility, timeliness, • accuracy, detail, consistency, and integrity of the information across all business users.

Many people in the organisation spend a lot of time re-keying numbers into spreadsheets.•

The senior executive is very vague about how key performance indicators are monitored.•

2.3.2 IT Vice Presidents, Directors, and ManagersAddressing other, more technically-oriented executives, the questions to be asked would look like the following examples:

How do our non-I/S end users analyze or report information?•

Do end users often ask IT to produce queries, reports, and other information from the database?•

Do end users frequently re-key data into spreadsheets or word processing packages?•

Does our production system suffer from a heavy volume of queries and reports running against the system?•

Would we like to see our end users receiving more business benefits from the IT organisation? The IT staff is • a data warehousing prospect if the answers point to problem areas, such as:

End users are relying on IT to perform most or all ad hoc queries and reports.•

End users have to re-key data into their spreadsheets on a regular basis.•

IT identifies end user dissatisfaction with the current reporting systems and processes.•

IT has a large backlog built up of end user requests for queries and reports.IT is concerned about end user queries and reports that are bogging down the production systems.•


22

2.3.3 CFOs, Financial Vice Presidents, and ControllersWhen talking to financially-oriented executives, there are some totally different questions to be asked to identify this part of the organisation as an active supporter of a BI project. Some sample questions are shown below:

How are our monthly management reports and budgets delivered and produced?•

How timely is that information?•

Do we spend more time preparing, consolidating, and reporting on the data, or on analyzing performance that • is based on what the data has highlighted?

Do all the company’s executives and managers have a single view of key information to avoid inconsistency?•

How easy is it to prepare budgets and forecasts, and then to disseminate that critical information?•

Can we easily track variances in costs and overhead by cost center, product, and location?•

Is the year-end consolidation and reporting cycle a major amount of duplicated effort in data preparation and • validation, and then in consolidation reporting? The financial staff is a data warehousing prospect if the answers given to these questions are like these:

Personnel like using spreadsheets, but they usually or often need to re-key or reformat data.•

They indicate in any way that their preferred reporting tool would be a spreadsheet if they did not have to • constantly re-key great amounts of numbers into them.

They admit that much time is spent in the production of reports and the gathering of information, with less time • actually spent analyzing the data, and they can identify inconsistencies and integrity issues in the reports that have been produced.

Budget collection is a painful and time consuming process and there is very little control available in the • collection and dissemination process.

The monthly management reports involve too much time and effort to produce and circulate, and do not easily allow queries and analysis to be run against them.

Management information does not go into sufficient detail, especially in terms of expense control and overhead • analysis.

General dissatisfaction is expressed with the current information delivery systems.•

2.3.4 Sales VPs, Product Managers, and Customer Service DirectorsAfter talking to the senior executive and to the technical and financial executives, there are some more possible sponsors for a BI project. These are the sales and marketing-oriented personnel, and their possible sponsorship may be evaluated with the following questions:

How do we perform ad hoc analysis against our marketing and sales data?•

How do we monitor and track the effectiveness of a marketing or sales promotion program?•

How do we re-budget or re-forecast sales figures and margins?•

Do we have to wait a long time (days? weeks?) for sales management information to become available at month • or quarter-end?

How do we track best/worst performance of product/customers, and how do we monitor/analyze product/• customer profitability?

Do we know our customers’ profiles: buying patterns, demographics, and so on?•

Are we and our staff using spreadsheets a lot, and re-keying great mounts of data?•

23

The sales and marketing staff is a BI prospect if:

Current reporting is very static and ad hoc requests must be accomplished through IT.•

Profitability versus volume and value cannot be easily analyzed, and the measurement of data is inconsistent; • for example, there might be more than one way of calculating margin, profit, and contribution.

There is no concept of re-planning and re-budgeting as it is too difficult to accomplish with the current • systems.

Suppliers cannot be provided with timely information, so it is very difficult to achieve reviews of their • performance.

Getting down to the right level of detail is impossible: for example, to the SKU level in a retail store.

General dissatisfaction is expressed with the current process of information flow and management.•

2.3.5 Operations and Production ManagementThe last group to be covered within this section is the management of operations and production. Their support can be evaluated by asking questions like these:

How is the validity of the MRP model checked and how accurate do we think it really is?•

How do we handle activity based costing?•

How do we handle ad hoc analysis and reporting for raw materials, on-time, and quality delivery?•

How do we handle production line efficiency, machine, and personnel efficiency?•

How do we evaluate personnel costs and staffing budgets?•

How do we handle shipments and returns, inventory control, supplier performance, and invoicing?•

The operations and production staff is a DW prospect if:

New projects cannot easily be costed out, and trends in quality, efficiency, cost, and throughput cannot be • analyzed.

The preferred access to information would be via a spreadsheet or an easy-to-use graphical user interface.•

Currently there is access to daily information only, which means much re-keying into spreadsheets for trending • analysis and so on is required.

The MRP model cannot easily be checked for accuracy and validity on a constant basis.•


24

2.4 Main BI TermsBefore we get into more detail about BI, this section will explain some of the terms related to Business Intelligence. Some common Business Intelligence tools are given below:

Data Mining•

Data Warehouse•

ODS•

Drill down•

OLTP•

OLTP Server•

OLAP•

Data Mart•

Data Visualisation•

Meta Data•

2.4.1 Operational DatabasesOperational databases are detail oriented databases defined to meet the needs of sometimes very complex processes in a company. This detailed view is reflected in the data arrangement in the database. The data is highly normalised to avoid data redundancy and “double-maintenance”.

2.4.2 OLTPOn-Line Transaction Processing (OLTP) describes the way data is processed by an end user or a computer system. It is detail oriented, highly repetitive with massive amounts of updates and changes of the data by the end user. It is also very often described as the use of computers to run the on-going operation of a business.

2.4.3 Data WarehouseA data warehouse is a database where data is collected for the purpose of being analyzed. The defining characteristic of a data warehouse is its purpose. Most data is collected to handle a company’s on-going business. This type of data can be called “operational data”. The systems used to collect operational data are referred to as OLTP (On-Line Transaction Processing). A data warehouse collects, organises, and makes data available for the purpose of analysis to give management the ability to access and analyze information about its business. This type of data can be called “informational data”. The systems used to work with informational data are referred to as OLAP (On-Line Analytical Processing).

Bill Inmon coined the term “data warehouse” in 1990. His definition is: “A (data) warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process.”

Subject-oriented: Data that gives information about a particular subject instead of about a company’s on-going • operations.

Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent • whole.

Time-variant: All data in the data warehouse is identified with a particular time period.•

25

2.4.4 Data MartA data mart contains a subset of corporate data that is of value to a specific business unit, department, or set of users. This subset consists of historical, summarised, and possibly detailed data captured from transaction processing systems, or from an enterprise data warehouse. It is important to realise that a data mart is defined by the functional scope of its users, and not by the size of the data mart database. Most data marts today involve less than 100 GB of data; some are larger, however, and it is expected that as data mart usage increases they will rapidly increase in size.

2.4.5 External Data SourceExternal data is data that cannot be found in the OLTP systems but is required to enhance the information quality in the data warehouse. The following figure shows some of these sources.

Fig. 2.1 External data sources(Source: capstone.geoffreyanderson.net)

2.4.6 OLAPOn-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

OLAP functionality is characterised by dynamic multi-dimensional analysis of consolidated enterprise data supporting end user analytical and navigational activities including:

Calculations and modelling applied across dimensions, through hierarchies and/or across members•

Trend analysis over sequential time periods•

Slicing subsets for on-screen viewing•

Drill-down to deeper levels of consolidation•

Reach-through to underlying detail data•

Rotation to new dimensional comparisons in the viewing area•


26

OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. OLAP helps the user synthesize enterprise information through comparative, personalised viewing, as well as through analysis of historical and projected data in various “what-if” data model scenarios. This is achieved through use of an OLAP Server.

2.4.7 OLAP ServerAn OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multi-dimensional data structures. A multi-dimensional structure is arranged so that every data item is located and accessed, based on the intersection of the dimension members that define that item. The design of the server and the structure of the data are optimised for rapid ad hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. The OLAP Server may either physically stage the processed multi-dimensional information to deliver consistent and rapid response times to end users, or it may populate its data structures in real-time from relational or other databases, or offer a choice of both. Given the current state of technology and the end user requirement for consistent and rapid response times, staging the multi-dimensional data in the OLAP Server is often the preferred method.

2.4.8 MetadataMetadata is the kind of information that describes the data stored in a database and includes such information as:

A description of tables and fields in the data warehouse, including data types and the range of acceptable • values.

A similar description of tables and fields in the source databases, with a mapping of fields from the source to • the warehouse.

A description of how the data has been transformed, including formulae, formatting, currency conversion, and • time aggregation.

Any other information that is needed to support and manage the operation of the data warehouse.•

2.4.9 Drill-DownDrill-down can be defined as the capability to browse through information, following a hierarchical structure. A small sample is shown in figure below.

27

Fig. 2.2 Drill-down(Source: capstone.geoffreyanderson.net)

2.4.10 Operational Versus Informational DatabasesThe major difference between operational and informational databases is the update frequency:

On operational databases a high number of transactions take place every hour. The database is always “up to • date”, and it represents a snapshot of the current business situation, or more commonly referred to as point in time.

Informational databases are usually stable over a period of time to represent a situation at a specific point in • time in the past, which can be noted as historical data.

For example, a data warehouse load is usually done overnight. This load process extracts all changes and new records from the operational database into the informational database. This process can be seen as one single transaction that starts when the first record gets extracted from the operational database and ends when the last data mart in the data warehouse is refreshed. Following figure shows some of the main differences of these two database types.


28

Fig. 2.3 Operational versus informational databases(Source: capstone.geoffreyanderson.net)

Data Mining: Data mining is the process of extracting valid, useful, previously unknown, and comprehensible information from data and using it to make business decisions.

2.5 Different BI ImplementationsDifferent approaches have been made in the past to find a suitable way to meet the requirements for On Line Analytical Processing. The figure below gives an overview of four major models to implement a decision support system.

29

Fig. 2.4 Business Intelligence implementations(Source: capstone.geoffreyanderson.net)

The approaches shown above are described below.

2.5.1 Summary TableA summary table on an OLTP system is the most common implementation that is already included in many standard software packages. Usually these summary tables cover only a certain set of requirements from business analysts. Study the figure below, it shows the advantages and disadvantages of this approach.


30

Fig. 2.5 Summary tables on OLTP(Source: capstone.geoffreyanderson.net)

2.5.2 OLTP Data at Separate ServerOLTP data moved to separate server no changes in the database structure are made. This mirroring is a first step to offload the workload from the OLTP system to a separate dedicated OLAP machine. As long as no restructuring of the database takes place, this solution will not be able to track changes over time. Changes in the past can not be reflected in the database because the fields for versioning of slowly changing dimensions are missing. Figure given below shows this approach, sometimes called ‘A Poor Mans Data Warehouse’.

31

Fig. 2.6 Poor man’s data warehouse(Source: capstone.geoffreyanderson.net)

The technique to move the original OLTP data regularly to a dedicated system for reporting purposes is a step that can be made to avoid the impact of long running queries on the operational system. In addition to the advantages in performance, security issues can be handled very easily in this architecture.

Totally isolated machines eliminate any interdependence between analysis and operational workload. The major problem that will still persist in this architecture is the fact that the database architecture has not changed or been optimised for query performance the most detailed level of information is copied over to the dedicated analysis server. The lack of summary tables or aggregations will result in long running queries with a high number of files and joins in every request. To build architecture like this, file transfer or FTP can be sufficient for some situations.

2.5.3 Single Data MartA growing number of customers are implementing single data marts now to get the experiences with data warehousing. These single data marts are usually implemented as a proof of concept and keep growing over time. “A data warehouse has to be built we cannot buy it!” This first brick in the data warehouse has to be kept under control too many “single data marts” would create an administration nightmare.

The two tiered model of creating a single data mart on a dedicated machine includes more preparation, planning and investment. This approach is shown in figure below.


32

Fig. 2.7 2-tiered data mart(Source: capstone.geoffreyanderson.net)

The major benefits of this solution compared to the other models are in performance; precalculated and aggregated values, higher flexibility to add additional data from multiple systems and OLTP applications, and better capabilities to store historical data. Metadata can be added to the data mart to increase the ease-of-use and the navigation through the information in the informational database. The implementation of a stand alone data mart can be done very quickly as long as the scope of the information to be included in the data mart is precisely limited to an adequate number of data elements.

The three-tiered data warehouse model consists of three stages of data stored on the system(s):OLTP data in operational databases.•

Extracted, detailed, denormalised data organised in a Star-Join Schema to optimise query performance.•

Multiple aggregated and precalculated data marts to present the data to the end user.•

33

Fig. 2.8 3-tiered data mart(Source: capstone.geoffreyanderson.net)

The characteristics of this model are:

Departmental data marts to hold data in an organisational form that is optimised for specific requests new • requirements usually require the creation of a new data mart, but have no further influence on already existing components of the data warehouse.

Historical changes over time can be kept in the data warehouse.•

Metadata is the major component to guarantee success of this architecture, ease-of-use and navigation support • for end users.

Cleansing and transformation of data is implemented at a single point in the architecture.•

The three different stages in aggregating/transforming data offer the capability to perform data mining tasks in the extracted, detailed data without creating workload on the operational system.

Workload created by analysis requests is totally offloaded from the OLTP system.•

2.6 Data Warehouse ComponentsThe figure below shows the entire data warehouse architecture in a single view. The following sections will concentrate on single parts of this architecture and explain them in detail.


34

Fig. 2.9 Data warehouse components(Source: capstone.geoffreyanderson.net)

This figure shows the following ideas. The processes required to keep the data warehouse up to date as marked are extraction/propagation, transformation/cleansing, data refining, presentation, and analysis tools.

The different stages of aggregation in the data are: OLTP data, ODS Star-Join Schema, and data marts.•

Metadata and how it is involved in each process is shown with solid connectors.•

The horizontal dotted line in the figure separates the different tasks into two groups.•

Tasks to be performed on the dedicated OLTP system are optimised for interactive performance and to handle • the transaction oriented tasks in the day-to-day-business.

Tasks to be performed on the dedicated data warehouse machine require high batch performance to handle the • numerous aggregations, precalculation, and query tasks.

2.7 Data SourcesData sources can be operational databases, historical data usually archived on tapes and external data, for example from market research companies or from the Internet or information from the already existing data warehouse environment. The data sources can be relational databases from the line of business applications. They also can reside on many different platforms and can contain structured information, such as tables or spreadsheets, or unstructured information, such as plain text files or pictures and other multimedia information.

35

2.7.1 Extraction/PropagationData extraction / data propagation is the process of collecting data from various sources and different platforms to move it into the data warehouse. Data extraction in a data warehouse environment is a selective process to import decision-relevant information into the data warehouse. Data extraction / data propagation is much more than mirroring or copying data from one database system to another. Depending on the technique, this process is either:

Pulling (Extraction) or•

Pushing (Propagation)•

2.7.2 Transformation/CleansingTransformation of data usually involves code resolution with mapping tables for example, changing 0 to female and 1 to male in the gender field and the resolution of hidden business rules in data fields, such as account numbers. Also the structure and relationships of the data are adjusted to the analysis domain. Transformations occur throughout the population process, usually in more than one step. In the early stages of the process, the transformations are used more to consolidate the data from different sources, whereas, in the later stages the data is transformed to suit a specific analysis problem and/or tool.

Data warehousing turns data into information, on the other hand, cleansing ensures that the data warehouse will have valid, useful, and meaningful information. Data cleansing can also be described as standardisation of data. Through careful review of the data contents, the following criteria are matched:

Correct business and customer names•

Correct and valid addresses•

Usable phone numbers and contact information•

Valid data codes and abbreviations•

Consistent and standard representation of the data•

Domestic and international addresses•

Data consolidation (one view), such as house holding and address correction•

2.7.3 Data RefiningData refining is creating subsets of the enterprise data warehouse, which have either a multidimensional or a relational organisation format for optimised OLAP performance. The figure below shows where this process is located within the entire BI architecture. The atomic level of information from the star schema needs to be aggregated, summarised, and modified for specific requirements. This data refining process generates data marts that:

Create a subset of the data in the star schema.•

Create calculated fields / virtual fields.•

Summarise the information.•

Aggregate the information.•


36

Fig. 2.10 Data refining(Source: capstone.geoffreyanderson.net)

This layer in the data warehouse architecture is needed to increase the query performance and minimise the amount of data that is transmitted over the network to the end user query or analysis tool. When talking about data transformation/cleansing, there are basically two different ways the result is achieved. These are:

Data aggregation: Change the level of granularity in the information. Example: The original data is stored on a • daily basis the data mart contains only weekly values. Therefore, data aggregation results in less record.

Data summarisation: Add up values in a certain group of information. Example: The data refining process • generates records that contain the revenue of a specific product group, resulting in more records.

2.7.4 Physical Database ModelIn BI, talking about the physical data model is talking about relational or multidimensional data models. Figure below shows the difference between those two physical database models.

37

Fig. 2.11 Physical database models(Source: capstone.geoffreyanderson.net)

Both database architectures can be selected to create departmental data marts, but the way to access the data in the databases is different:

To access data from a relational database, common access methods like SQL or middleware products like • ODBC can be used.

Multidimensional databases require specialised APIs to access the usually proprietary database architecture.•

2.7.5 Logical Database ModelIn addition to the previously mentioned physical database model, there also is a certain logical database model. When talking about BI, the most commonly used logical database model is the Star-Join Schema. The Star-Join Schema consists of two components study the figure below.

Fact tables•

Dimension tables•

Fig. 2.12 Logical data model(Source: capstone.geoffreyanderson.net)


38

The following is a definition for those two components of the Star-Join Schema:Fact Tables: -“what are we measuring?” Contain the basic transaction-level information of the business that is • of interest to a particular application. In marketing analysis, for example, this is the basic sales transaction data. Fact tables are large, often holding millions of rows, and mainly numerical.

Dimension Table: - “by what are we measuring?” Contain descriptive information and are small in comparison • to the fact tables. In a marketing analysis application, for example, typical dimension tables include time period, marketing region, product type etcetera.

2.7.6 Metadata InformationMetadata structures the information in the data warehouse in categories, topics, groups, hierarchies and so on. It is used to provide information about the data within a data warehouse, as given in the following list and shown in figure given below.

“Subject oriented”, based on abstractions of real-world entities like ‘project’, ‘customer’, ‘organisation’ • etcetera.

Defines the way in which the transformed data is to be interpreted.•

Gives information about related data in the Data Warehouse.•

Estimates response time by showing the number of records to be processed in a query. Holds calculated fields • and pre-calculated formulas to avoid misinterpretation, and contains historical changes of a view.

Fig. 2.13 Metadata(Source: capstone.geoffreyanderson.net)

39

The data warehouse administrator perspective of metadata is a full repository and documentation of all contents and all processes in the data warehouse, whereas, from an end user perspective, metadata is the roadmap through the information in the data warehouse.

2.7.7 Operational Data Source (ODS)The operational data source can be defined as an updatable set of integrated data used for enterprise-wide tactical decision making. It contains live data, not snapshots, and has minimal history that is retained.

Fig. 2.14 (ODS) Operational data store(Source: capstone.geoffreyanderson.net)

Here are some features of an Operational Data Store (ODS):An ODS is subject oriented: It is designed and organised around the major data subjects of a corporation, such as “customer” or “product.” They are not organised around specific applications or functions, such as “order entry” or “accounts receivable”. An ODS is integrated: It represents a collectively integrated image of subject-oriented data which is pulled in from potentially any operational system. If the “customer” subject is included, then all of the “customer” information in the enterprise is considered as part of the ODS.

An ODS is current valued: It reflects the “current” content of its legacy source systems. “Current” may be defined in different ways for different ODSs depending on the requirements of the implementation. An ODS should not contain multiple snapshots of whatever “current” is defined to be. That is, if “current” means one accounting period, then the ODS does not include more that one accounting period’s data. The history is either archived or brought into the data warehouse for analysis.

An ODS is volatile: Since an ODS is current valued; it is subject to change on a frequency that supports the definition of “current.” That is, it is updated to reflect the systems that feed it in the true OLTP sense. Therefore, identical queries made at different times will likely yield different results because the data has changed.

An ODS is detailed: The definition of “detailed” also depends on the business problem that is being solved by the ODS. The granularity of data in the ODS may or may not be the same as that of its source operational systems.


40

2.7.8 Data MartFigure below gives the idea where data marts are located logically within the BI architecture. The main purpose of a data mart can be defined as follows:

Store pre-aggregated information.•

Control end user access to the information.•

Provide fast access to information for specific analytical needs or user group.•

Represents the end users view and data interface of the data warehouse.•

Creates the multidimensional/relational view of the data.•

Offers multiple “slice-and-dice” capabilities.•

The database format can either be multidimensional or relational.

Fig. 2.15 Data mart(Source: capstone.geoffreyanderson.net)

2.7.9 Presentation and Analysis ToolsFrom the end user’s perspective, the presentation layer is the most important component in the BI architecture shown in figure below. To find the adequate tools for the end users with information requirements, the assumption can be made that there are at least four user categories and the possibility of any combination of these categories.

The “power user”Users those are willing and able to handle a more or less complex analysis tool to create their own reports and analysis. These users have an understanding of the data warehouse structure, interdependencies of the organisation form of the data in the data warehouse.

The “non-frequent user”This user group consists of people that are not interested in the details of the data warehouse but have a requirement to get access to the information from time to time. These users are usually involved in the day-to-day business and don’t have the time or the requirement to work extensively with the information in the data warehouse. Their virtuosity in handling reporting and analysis tools is limited.

41

Fig. 2.16 Presentation and analysis tools(Source: capstone.geoffreyanderson.net)

Users requiring static informationThis user group has a specific interest in retrieving precisely defined numbers in a given time interval, such as: “I have to get this quality-summary report every Friday at 10:00 AM as preparation to our weekly meeting and for documentation purposes.”

Users requiring dynamic or ad hoc query and analysis capabilitiesTypically, this is a business analyst. All the information in the data warehouse might be of importance to those users, at some point in time. Their focus is related to availability, performance, and drill-down capabilities to slice and dice through the data from different perspectives at any time.

Different user-types need different front-end tools, but all can access the same data warehouse architecture. Also, the different skill levels require different visualisation of the result, such as graphics for a high-level presentation or tables for further analysis.


42

SummaryBusiness intelligence is not business as usual. It’s about making better decisions easier and making them more • quickly.

Businesses acquire data, such as demographics and mailing lists, from outside sources.•

Consolidating and organising data for better business decisions can lead to a competitive advantage, and learning • to uncover and leverage those advantages is what business intelligence is all about.

Dissatisfaction is exhibited with the current reporting systems, especially in terms of flexibility, timeliness, • accuracy, detail, consistency, and integrity of the information across all business users.

Operational databases are detail oriented databases defined to meet the needs of sometimes very complex • processes in a company.

A data warehouse is a database where data is collected for the purpose of being analyzed.•

The systems used to collect operational data are referred to as OLTP (On-Line Transaction Processing).•

Bill Inmon coined the term “data warehouse” in 1990. His definition is: “A (data) warehouse is a subject-• oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process.”

A data mart contains a subset of corporate data that is of value to a specific business unit, department, or set • of users.

External data is data that can not be found in the OLTP systems but is required to enhance the information • quality in the data warehouse.

OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, • regardless of database size and complexity.

An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and • operate on multi-dimensional data structures.

Drill-down can be defined as the capability to browse through information, following a hierarchical structure.•

Metadata is the kind of information that describes the data stored in a database and includes information.•

A summary table on an OLTP system is the most common implementation that is already included in many • standard software packages.

A data warehouse has to be built we cannot buy it.•

The three different stages in aggregating/transforming data offer the capability to perform data mining tasks in • the extracted, detailed data without creating workload on the operational system.

Data sources can be operational databases, historical data usually archived on tapes and external data.•

Data extraction / data propagation is the process of collecting data from various sources and different platforms • to move it into the data warehouse.

Transformation of data usually involves code resolution with mapping tables.•

Data warehousing turns data into information, on the other hand, cleansing ensures that the data warehouse will • have valid, useful, and meaningful information.

Data refining is creating subsets of the enterprise data warehouse, which have either a multidimensional or a • relational organisation format for optimised OLAP performance.

43

ReferencesReinschmidt, J., • BusinessIntelligenceCertificationGuide [pdf] Available at: <capstone.geoffreyanderson.net/export/.../sg245747.pdf - United States> [Accessed 27 April 2012].

Business Intelligence Components• [Online] Available at: <download.microsoft.com/.../Business%20Intelligence%20components..> [Accessed 27 April 2012].

Haag, 2005. • Business Driven Technology W/Cd, Tata McGraw-Hill Education Publication.

Schlukbier, A., 2007. Implementing • EnterpriseDataWarehousing:AGuide for Executives, Lulu.com Publication.

2010. • Data Warehouse Basics [Video Online] Available at: <http://www.youtube.com/watch?v=EtaUzQrAPKE&feature=related> [Accessed 27 April 2012].

2011, • 1.2.1 BI Tools and Processes, [Video Online] Available at: <http://www.youtube.com/watch?v=ZpBtxKf20zY> [Accessed 27 April 2012].

Recommended ReadingPanos, V., Vassiliou, Y., Lenzerini, M. & Jarke, M., 2003. • Fundamentals of Data Warehouses, 2nd ed. Springer Publication.

Paredes, J., 2009. • The Multidimensional Data Modeling Toolkit: Making Your Business Intelligence Applicatio, John Paredes Publication.

Scheps, S., 2008. • Business Intelligence For Dummies, John Wiley & Sons Publication.


44

Self Assessment______ describes the way data is processed by an end user or a computer system.1.

OLTP Servera. ODSb. OLAPc. OLTPd.

A _________is a database where data is collected for the purpose of being analysed.2. data marta. data warehouseb. meta Datac. data Miningd.

An _________is a high-capacity, multi-user data manipulation engine specifically designed to support and 3. operate on multi-dimensional data structures.

OLAP servera. OLTP Serverb. ODSc. OLAPd.

__________is the kind of information that describes the data stored in a database and includes information.4. Data marta. ODSb. Metadatac. OLAPd.

________is the capability to browse through information, following a hierarchical structure.5. Drill-downa. Meta datab. Drill-upc. Data hauntingd.

Which one of the following is not the stage of aggregation in data?6. OLTP dataa. ODS Star-Join Schemab. data martsc. OLAPd.

_________is the process of collecting data from various sources and different platforms to move it into the 7. data warehouse.

Data aggregationa. Data extractionb. Data manipulationc. Drill-downd.

45

Which of the following statement is false?8. Data warehousing turns data into information.a. Data cleansing is the standardisation of data.b. Data extraction in a data warehouse environment is a selective process.c. Data propagation is mirroring or copying of data.d.

________is creating subsets of the enterprise data warehouse, which have either a multidimensional or a relational 9. organisation format for optimised OLAP performance.

Data refininga. Data transformationb. Data miningc. Data abstractiond.

_________structures the information in the data warehouse in categories, topics, groups, hierarchies and so 10. on.

OLTPa. OLAPb. Metadatac. OLTP serverd.


46

Chapter III

Open Source Tools for Business Intelligence

Aim


elucidate criteria for BI tools categories•

explicate extract transform load tools•

explain available integrated BI suites•

Objectives


explain types of On• ‐Line Analytical Processing Servers

explicate how to use open source business intelligence tools•

elucidate adoption of open source BI tools•

Learning outcome


understand open source business intelligence tools•

distinguish on• ‐line analytical processing clients

enlist various open source DBMSs•

47

3.1 IntroductionThe use of Business Intelligence (BI) tools is popular in industry. However, the use of open source tools for BI is still quite limited compared to other types of software. The dominating tools are closed source and commercial. Only for database management systems (DBMSs), there seems to be a market where open source products are used in industry, including business‐critical systems such as online travel booking, management of subscriber inventories for telecommunications, etc. Thus, the situation is quite different from, for example, the web server market where open source tools as Linux and Apache are very popular.

To understand the limited use of open source BI tools better, it is of interest to consider which tools are available and what they are capable of. This is the purpose of this paper. In the survey, we will consider products for making a complete solution with an Extract‐Transform‐Load (ETL) tool that loads data into a database managed by a DBMS. On top of the DBMS, an On‐Line Analytical Processing (OLAP) server providing for fast aggregate queries will be running. The user will be communicating with the OLAP server by means of an OLAP client.

We limit ourselves to these kinds of tools and do not consider, for example, data mining tools or Enterprise Application Integration (EAI) tools. Use of data mining tools would also be of relevance in many BI settings, but data mining is a more advanced feature which should be considered in future work. EAI tools may have some similarities with ETL tools, but are more often used in online transactional processing (OLTP) systems. We focus on the individual components such that a “customised” solution is built, not the integrated BI suites. In comparison with the status in 2004, there are now mature and powerful open source tools available in all four categories in 2004, only the DBMS category had sufficiently mature tools, so it is now for the first time possible to make a complete BI solution using only open source tools.

3.2 Criteria for all Tool CategoriesThere exist many different open source licenses, for example the GNU Public License and the Mozilla Public License. The different licenses vary widely with respect to what they allow and how modified source code can or must be distributed. For a potential user, it is important if a certain tool can be used with its existing platform. It is thus of interest to consider with which hardware and software platforms the tools can be used. For many professional users, it is important to know whether commercial support, training, and consulting services are available for a product, and the survey therefore considers these aspects. A related considered issue is the type and amount of documentation available. Many open source projects have strong user communities using forums and/or mailing lists. The survey therefore also considers if such active forums or mailing lists exist for the products.

3.3 Criteria for Extract‐Transform‐Load ToolsFor an ETL tool, it is investigated if the tool is for relational OLAP (ROLAP) where relational database tables are loaded or if it is for multidimensional OLAP (MOLAP) where multidimensional cubes are loaded. The supported data sources and targets are obviously also of interest. In many DW environments, it is of great practical interest to be able to load only the changes made to the source data since the previous load. The survey therefore considers the possibilities for doing such an incremental load. The survey also considers how an ETL job is specified, for example by means of a graphical user interface (GUI) or an Extensible Markup Language (XML) file. Also the possibilities for doing transformations and data cleansing are considered both with respect to predefined transformations and user‐defined transformations. Due to the large data volumes in data warehousing, parallel job execution is also of great practical interest and it is investigated if the tools support parallelism.

3.4 Criteria for Database Management SystemsFor DBMSs there are many interesting things to consider. In this chapter the scope is limited to investigate features and possibilities that are relevant to data warehousing. This includes investigating whether the DBMS can handle large datasets of many gigabytes. Related issues to consider for the DBMSs are their support for materialised views, bitmap indices, and star joins which all can improve performance for DW applications. Possibilities for replication and partitioning are also of interest for many DW environments and the survey also considers if the tools support these features. Finally, it is considered which programming languages are supported for stored procedures/user‐defined functions.


48

3.5 Criteria for On‐Line Analytical Processing ServersFor an OLAP server, the survey investigates whether it is a ROLAP or a MOLAP server. It is also considered which data sizes it aims at handling and which underlying data sources it can be used with if any for example a specific DBMS like MySQL. For performance reasons it can be very beneficial for an OLAP server to use pre‐computed aggregate tables, and the possibilities for this are also investigated here. It is also investigated how the user performs the specification of cubes, for example by means of a GUI or an XML file. Finally, it is considered what application programming interface (API) and query language the OLAP server offers.

3.6 Criteria for On‐Line Analytical Processing ClientsFor an OLAP client, the survey considers which OLAP server(s) the client can be used with and which query language it uses/generates. Further, the types of supported reports are investigated.

3.6.1 Extract-Transform‐Load ToolsIn previous work, the category of ETL tools only had few possibilities and was considered to be the least mature and most difficult to use. At the time of this writing, many more tools are available and some of these are quite mature. All of the described tools but Pequel are implemented in Java and can thus be used on many different hardware and software platforms.

3.6.2 ApatarApatar (Apatar, 2008) is a data integration and ETL tool developed by the company also bearing the name Apatar. Here we are considering Apatar version 1.1.9 from May 2008. Apatar seems to have a very fast release cycle as version 1.1.0 was released in October 2007, and version 1.1.10 in June 2008. Apatar is released under a dual‐licensing scheme and is available under the GNU General Public License (GPL) or under a commercial license, if desired. The development company offers training, support, and consulting. The documentation exists in four PDF files (around 40 pages in total) as well as in some wikis. Further, some user forums exist.

Apatar is ROLAP‐oriented and has direct support for a wide selection of relational DBMSs as well as generic JDBC support. Further it supports file formats such a comma separated values (CSV) and Excel and ERP and CRM systems (Compiere ERP, SalesForce.com, and SugarCRM). The specification of a job is done in a GUI. However, it is not possible to do an incremental load. It is not possible to run jobs in parallel yet.

Apatar has built‐in data quality tools for verification of US addresses, phone numbers and email addresses. It is possible for the user to define transformations in Java, although this is not as simple as in some of the other tools. To make its own transformation available in Apatar, the user must define two classes both inheriting from provided base classes and edit an XML file describing the available plug‐ins.

3.6.3 Clover.ETLClover.ETL (Clover.ETL, 2008) is developed by OpenSys and Javlin and is offered under either a GNU Library General Public License (LGPL) or a commercial license. Support and consulting can be bought from the above‐mentioned companies. Here, version 2.4.6 of Clover.ETL is considered (2.4 was released in Feb. 2008, 2.4.7 was released in June 2008). Unlike the previously described ETL tool, Clover.ETL does not have an open‐source GUI. A closed‐source GUI exists, but is only free of charge if not used commercially. So, for a solution fully based on open source, the user has to specify the ETL job in XML when using Clover.ETL. A free 181‐page manual exists for the GUI, but for Clover.ETL itself, the user has to settle with the wiki‐documentation and a User’s Guide consisting of 104 slides from a presentation. Further, two forums exist.

Clover.ETL is a ROLAP tool and transfers structured data from different DBMSs and file formats. The user can create transformations in Java together with XML descriptions of them or in Clover.ETL’s own TL language. Clover.ETL supports parallel execution and, for some DBMSs, also bulk‐loading, while no support for incremental load was found.

49

3.6.4 ETL IntegratorETL Integrator (JBIWiki: ETLSE, 2008) is an ETL tool developed by Sun Microsystems. It has a service engine that makes ETL operations available as web services and further it has an ETL editor which is integrated into the Netbeans integrated development environment (IDE) version 6.1. It is released under the Common Development and Distribution License (CDDL). No information about commercial support specific for ETL Integrator was found. But Sun will provide commercial support for its upcoming integration platform GlassFish ESB (Sun, 2008) which is built on the OpenESB project which ETL Integrator is part of. The documentation found consists of design documents, wikis, and video tutorials. Also, the User Guide is a tutorial. Further, forums for the OpenESB project exist, but only few postings are related to ETL Integrator.

ETL Integrator is a ROLAP tool outputting to relations. It supports different relational DBMSs as sources as well as different file formats and OpenESB components for connecting to ERP/CRM systems. The tool is integrated into the Netbeans IDE and ETL jobs can be specified graphically from there. ETL Integrator supports incremental load as well as parallel execution of parts. Also bulk loading is supported. In the descriptions of ETL Integrator it is said that its editor “has many predefined transformations as well as cleansing operators/functions” and further that it is possible to add user defined functions. It does offer name and address parsing and normalisation, but these operators depend on SQL calls to the database except for flat files for which an internal engine is used.

3.6.5 KETLKETL (Kinetic Networks, 2008) not to be mistaken for Kettle described below is developed by Kinetic Networks from which support can also purchased. The latest version of KETL is 2.1.24 from April 2008. The oldest generally available release in 2.1 series is 2.1.12 from April 2007. KETL is partly released under the GPL license and partly under the LGPL license. The homepage for KETL states that the documentation currently is being overhauled and that only the Installation Guide has already been updated. Older versions of the documentation are still available in the mean time. However, these (37 pages Administration Guide and 24 slides in a Training Presentation) fail to describe how ETL jobs are defined in the used XML language. More than 60 example XML files are available but they also lack documentation.

KETL is ROLAP oriented and can be used with JDBC sources. KETL has special support for three DBMSs and flat files as well as XML files. The user must specify the ETL jobs in an XML file. Transformations are apparently possible, but due to the missing documentation it is not clear how to create them. Likewise, it is unclear if incremental loads are supported. KETL is capable of executing parts in parallel.

3.6.6 Kettle / Pentaho Data IntegrationKettle (Pentaho, 2008c) started as an independent open source ETL project, but was in 2006 acquired by Pentaho to be included in the Pentaho BI suite (Pentaho, 2008b). Thus Kettle is now also branded under the name Pentaho Data Integration. This survey considers version 3.0 of Kettle (3.0.0 was released in Nov. 2007, 3.0.4 in June 2008), but version 3.1 is expected to be released soon.

Kettle is released under the LGPL. Kettle has a graphical designer for jobs and transformations. This designer has a manual of 274 pages and also 40 pages with frequently asked questions (FAQs) and answers. Further, very active forums exist (more than 22,000 posts in 5,000 threads in the last 2½ years). Pentaho offers support, training and consulting, but many Pentaho partners also offer such services.

Kettle is ROLAP‐oriented, but another open‐source project which provides a plugin that enables Kettle to output data to the Palo MOLAP server. Kettle supports around 35 different DBMSs also generic JDBC and ODBC as well as a variety of flat files. A 3rd party SAP connector is also available, but it is not yet ready for version 3.0 and it is commercial. Incremental load is possible in the sense that Kettle logs when a job was executed and it is possible to use this timestamp in the queries to only select new data. Further, an “insert or update” step is available.


50

ETL jobs are specified in a GUI. Kettle is shipped with more than 80 predefined transformations, and further the user can implement transformations in JavaScript. Kettle also supports debugging of these with breakpoints etc. It is possible to use “clustering” where a transformation step can be split into parts that are executed on distinct servers. Apart from that, parallel jobs are not supported in version 3.0, but planned for the up‐coming 3.1 release. Junk dimensions and slowly changing dimensions of type 1 and 2 are supported by Kettle and for some DBMSs (experimental) bulk loading can be applied.

3.6.7 OctopusOctopus (Together Teamlösungen, 2007) is an ETL tool from Enhydra.org with the LGPL license. Here version 3.6‐5 from Oct. 2007 is considered (3.6‐1 was released in June 2006). A manual of 122 pages exist for Octopus and further a mailing list exists. The latter is, however, not very active and has had 2 posts in the first half of 2008 and 25 in all of 2007. Commercial support for Enhydra.org’s products is available from Together Teamlösungen and other commercial vendors.

Octopus is also ROLAP‐oriented and uses JDBC to connect to data sources and targets. ETL jobs are specified in XML files a GUI for generating skeletons for these XML files as well as file dumps of the database content exists, though. Octopus can update values, but does not support incremental loads in more advanced ways. It is possible to implement transformations in Java and JavaScript. Further, Octopus has a few predefined transformations for setting a default value, ensuring a maximum length of a string, and for correction of foreign key values. Parallel job parts and bulk loading are apparently not supported.

3.6.8 Palo ETL ServerPalo ETL Server (Jedox, 2008a) is developed by Jedox AG from which commercial support and training options are also available. Palo ETL Server is released under the GPL. Version 1.0 was released in April 2008 and followed by 1.1 in July 2008. A manual with 56 pages is freely available. A forum with some activity also exists. Palo ETL Server is the only considered ETL tool that is MOLAP‐oriented as it is made for loading data into the Palo MOLAP server also created by Jedox AG. It loads data from JDBC sources and CSV files, LDAP servers, and MOLAP cubes/dimensions. Jobs are specified in XML (a GUI is planned for a future release), and transformations can be implemented in Java. The jobs can be parameterised such that incremental loads to some degree can be supported. Parallel jobs are not supported.

3.6.9 PequelPequel (Gaffiero, 2007) is the only considered ETL tool that is not written in Java. It is implemented in Perl and runs on UNIX‐like platforms and on Windows using Cygwin. Pequel’s license is GPL. Version 3.0.94 of Pequel is considered here, but as the documentation is not complete, documentation for version 2.4.6 has also been used. The documentation for version 3.0 consists of a Programmer’s Reference (30 pages) and a Pequel Type Catalog (54 pages) which are both quite technical and serve as documentation for the Pequel source code. For version 2.4, a User Guide (72 pages) also exists but with many sections that have been left empty. Forums exist, but the traffic is low (no posts from Sep. 2007 to June 2008). No commercial support offerings were found in this survey.

Pequel generates Perl and C code for the load job. It is mainly targeted at processing files to generate other files. However, support for relations using Perl’s DBI module also exists. A job is specified either by means of a Perl API or by means of an XML file. Data conversion and rejection of records (based on regular expressions) are possible. Further, the user can use Perl’s functions and operators. It is possible to distribute read data records to different Pequel processes and merge them again and in this way execute parts in parallel.

51

3.6.10 ScriptellaScriptella (Kupolov, 2008) is an ETL and script execution tool. It is released under the Apache License. Here, version 1.0beta is considered. A manual of 23 pages exist as well as a forum. The developer offers commercial support and consulting. Scriptella is intended for ROLAP‐use (as well as output to files) and JDBC drivers for different DBMSs are included together with JDBC drivers for different flat files, XML files, and LDAP servers (but any other JDBC driver can also be used). ETL jobs are specified in an XML file where Java or any scripting language compatible with the JSR‐223 standard can be used directly.

In a Scriptella script, data rows can be fetched from multiple connections by queries. In the script it is specified what to do for each row in the query result for example, perform an SQL statement using the values from the result of the outer query, or apply certain transformations. It is possible to nest queries and scripts written in different languages while still sharing variables. Scriptella’s focus is on simplicity and it does not have out‐of the box transformations available or incremental load support. On the other hand, the simplicity of using code for user‐defined transformations and logic should be noted.

3.6.11 Talend Open Studio / JasperETLTalend Open Studio (Talend, 2008) is developed by Talend that also offers support, training and consulting. The GPL‐licensed Talend Open Studio is also included in the open‐source BI package from JasperSoft (JasperSoft, 2008), there under the name JasperETL. The survey considers version 2.3 of Talend Open Studio (2.3.0 was released in Feb. 2008, 2.3.3 in May 2008), but since the data was collected, version 2.4.0 has been released. Compared to the other open source ETL tools, it has a large printable documentation with a User’s Guide (161 pages) and a Reference Guide (550 pages).

However, personal information, like name and email, must be given to access this documentation. Talend Open Studio is primarily ROLAP‐oriented, but output to the Palo MOLAP server is also supported. Talend claims to support more than 100 different source systems. These include different DBMSs, files, and web services, Subversion logs etc. Further CRM systems like SugarCRM, CentricCRM, SalesForce.com, and VtigerCRM are supported. The ETL jobs are specified in a GUI. Like Pequel, Talend Open Studio generates code for a stand‐alone ETL application. The generated code is Java or Perl. It is possible for the user to specify transformations (also in Java or Perl).

Further, Talend Open Studio comes with a set of predefined transformations, including six for data quality matching, replacing, etcetera and from version 2.4 also name and address parsing but only when Perl code is generated. The generated code can execute parts in parallel, but the parallelism support in Talend is still being extended. Talend supports slowly changing dimensions and bulk load, while incremental load is done by use of look‐ups possibly followed by inserts or updates.

Compared to the previous survey many more open source ETL tools are available today. Also, the quality of the existing tools seems to have increased a lot in the mean time. Four out of the ten tools include GUIs where ETL jobs are specified and for one of the remaining tools, a closed source but free for non‐commercial use GUI exists. Nine out of the ten described tools are implemented in Java and one in Perl and all the tools thus run on many different platforms. The tools are primarily targeted at ROLAP (Palo ETL Server being the exception) and in general support a variety of relational DBMSs as well as generic JDBC, common file formats like Excel and XML. Some of the tools can also extract data from ERP and CRM systems, but not all of these connectors are for free. The most notable tools are Kettle and Talend which both have large user communities, comprehensive documentation, and many features and are included in BI suites.

3.7 Database Management SystemsIn previous work, the category of DBMSs was considered to be the most mature of the considered categories. Also, for the current survey many mature DBMSs have been found. More open source DBMSs than those described below exist, but they have a low visibility compared to the described ones and/or are mainly for use‐cases that are less relevant for BI‐usage, for example, for embedded usage with smaller data sets.


52

3.7.1 FirebirdFirebird (Firebird Project, 2008) is based on the code base for the commercial DBMS InterBase version 6.0. The current version of Firebird is 2.1 (from April 2008). Firebird uses two licenses which are both similar to the Mozilla Public License (MPL). Firebird runs on Windows, Linux, FreeBSD, and MacOS X. Binary releases for Solaris and HP‐UX are not yet available for version 2.1, but are available for the 2.0 series from Nov. 2006. Firebird is a commercially independent project, but a large part of the development is done by the company IBPhoenix which also offers support, training, and consulting.

The documentation for Firebird is still not complete. On the homepage it is stated that the project is working on full user’s and reference guides, but that the current documentation still consists of the manuals for InterBase 6.0 combined with the Firebird release notes that describe changes made to the Firebird code. All changes between InterBase 6.0 and Firebird 1.5 are documented, but updates for 2.0 and 2.1 are still in preparation. Apart from this documentation, different (rather short) guides and manuals exist in PDF and HTML format.

Active mailing lists exist for the project (the support list has had around 95,000 posts since Nov. 2000). On‐disk bitmap indexes are not supported, but Firebird can combine indexes and form bitmaps in memory. Firebird does not support materialised views, star joins or partitioning. Replication is not available in the Firebird distribution itself, but (both commercial and open source) 3rd party tools provide this. With respect to data sizes, it should be noted that tables are limited to 2 billion rows in Firebird, but that there is no limit on the byte‐size of databases the largest known database has more than 11TB data.

It is reported that the current Firebird has problems with scalability on computers with multiple CPUs, but these problems will be solved in the coming Firebird version 3.0. The user can implement stored procedures (SPs) in Firebird’s procedural language PSQL. Further user‐defined functions (UDFs) can be loaded from external shared object libraries and, thus, the user can implement in C, C++, Delphi, etc.

3.7.2 Ingres DatabaseIngres Database (Ingres, 2008) developed by Ingres is available under a commercial license (this is the “Enterprise Edition” of Ingres Database) or under the GPL (the “Community Edition”). The current version is Ingres 2006 Release 2 but Release 3 is available in a beta version. Ingres supports its products for 15 years, but support and maintenance from Ingres is only available for the Enterprise Edition however other independent companies also provide support and training. Ingres Database runs on Windows and a variety of different UNIX‐like platforms. Active community forums exist and Ingres also offers 25 free manuals with close to 8,000 pages in total.

Materialised views, bitmap indexes, and star joins are not supported. Multi‐master replication and partitioning (based on range, value, or hash) including sub‐partitioning are supported out‐of‐the‐box in Ingres Database. With respect to scalability, Ingres claims that Ingres Database is capable of handling many terabytes of data easily. SQL can be used for stored procedures and further user‐defined functions can be implemented in C.

3.7.3 LucidDBLucidDB (LucidDB, 2008) is developed by the software company LucidEra and the non‐profit organisation The Eigenbase Project. The LucidDB server is licensed under the GPL while the LucidDB client is licensed under the LGPL. The newest version of LucidDB is version 0.7.3 from March 2008. On LucidDB’s homepage, it is stated that LucidDB is “purpose‐built entirely for data warehousing and business intelligence”. This is in contrast to the other considered tools apart from MonetDB. LucidDB and MonetDB are also the only considered column‐stores.

In a column‐store, all data tables are split vertically at the physical layer such that each column is stored on‐disk independently of other columns. This is different from traditional row stores where data from different columns is stored together in rows. LucidDB runs on 32 and 64 bit Linux and on 32 bit Windows (using Cygwin). LucidEra does explicitly not intend to sell support or commercial licenses for LucidDB. The documentation consists of relatively short wiki manuals and tutorials. Further, there is a mailing list but this has had less than 100 posts from May 2007 to May 2008.

53

LucidDB supports B‐tree and bitmap indexes. LucidDB chooses itself which indexes to create and can also combine the two types. Star joins are also supported, while partitioning and replication is not supported. Support for materialised views is planned for a future release. It is reported that LucidDB has been tested with 10GB TPC‐H data, but the performance results have not been found during this survey. User‐defined functions can be created in Java. It is also possible to wrap external data sources like files or tables from another DBMS and use them as traditional tables from LucidDB.

While LucidDB offers many features relevant for data warehousing, it should be noted that it is still not a mature DBMS. For example, foreign keys, sub‐queries, transaction handling, and support for altering table definitions are still missing.

3.7.4 MonetDBMonetDB (CWI, 2008) is the second column‐store considered in this survey. Like LucidDB, it is not a DBMS made for on‐line transactional processing (OLTP) with highly concurrent workloads. Instead the focus is on efficient handling of query‐intensive access patterns. MonetDB is developed by the research institute CWI and has a license similar to the MPL. It runs on Windows and different UNIX like operating systems. In the implementation, care has been taken to use the hardware very efficiently. In this survey the “Feb2008” release of MonetDB is considered (the “Jun2008” release became available at the end of June 2008). A manual with 260 pages is provided for MonetDB.

Further, a manual of 113 pages exist for the SQL part of MonetDB. A commercial spin‐off that offered support existed, but this has been acquired by another company and no commercial support for the current MonetDB has been found during this survey. An active mailing list also exists. Like LucidDB, MonetDB itself picks which indexes to create. However, bitmap indexes seem to be unsupported. Partitioning, replication, and materialised views are also currently unsupported, but future additions are planned for these areas. The user can define stored procedures in SQL as well as external functions in MonetDB’s proprietary MAL language and in C.

3.7.5 MySQLMySQL (MySQL, 2008), developed by MySQL (now owned by Sun Microsystems), is available under a commercial license or under the GPL. It can be downloaded in two versions: The “Community Server” which is free and the “MySQL Enterprise” edition which is not free, but for which extra features and commercial services exist. The latest production release of the community server is 5.0.51 from April 2008 (the first production release from the 5.0 series was from October 2005), and version 5.1 is available as a release candidate. Sun offers a wide range of commercial support, consulting, and training. It is reported that MySQL‐based data warehouses larger than 30 terabytes exist. MySQL runs on Windows and a large collection of UNIX‐like systems. The manual has 2071 pages and further very active forums exist.

There is no support for star joins or materialised views. Also partitioning is not supported in the 5.0 series, but is available in the upcoming 5.1 series where range, list, hash, and key partitioning are supported. Statement‐based master‐slave asynchronous replication has been available in MySQL since version 3.23, but from release 5.1, row‐based replication is also available. Synchronous replication is also possible using MySQL Cluster, but as stated in the manual “all live data storage is done in memory” then, but from version 5.1 non‐indexed columns can be saved on‐disk. On‐disk bitmap indexes are not supported in MySQL, but MySQL can perform an index merge where a bitmap is built in‐memory. The user can implement stored procedures in SQL and user defined functions in C/C++.


54

3.7.6 PostgreSQLPostgreSQL (PostgreSQL Global Development Group, 2008) is released under the BSD license. The newest version is 8.3.3 from June 2008 (8.3.0 was released in Feb. 2008). PostgreSQL runs on Windows and on a large collection of UNIX‐like operating systems. PostgreSQL’s development is led by its community and not by a single company. Due to its non‐restrictive BSD license, several open source and commercial derivatives exist. For example, Netezza (Netezza, 2008), EnterpriseDB (EnterpriseDB, 2008), and Greenplum (Greenplum, 2008) offer PostgreSQL‐based products. Several companies also offer commercial support, training and consulting. The PostgreSQL manual consists of 1908 pages and very active mailing lists also exist as well as Internet Relay Chats (IRCs) in more languages. It is reported that databases larger than 4 terabytes are used in production environments.

Materialised views and star joins are not supported in PostgreSQL. On‐disk bitmaps are not supported yet, but they are planned for inclusion in a future version. PostgreSQL does already support creation of an in‐memory bitmap when combining other, existing indexes. Partitioning is to some degree supported by means of PostgreSQL’s table inheritance features. The user has to create each partition manually and to create logic for redirecting inserts to the correct partition. The solution does not work well with parameterised queries and enforcement of integrity constraints.

In the current PostgreSQL distribution, replication is not supported out‐of‐the‐box as this deliberately has been left to let 3rd party tools offer competing solutions. As this standpoint is now considered to hinder acceptance of PostgreSQL, it has been decided to include simple asynchronous replication in the future standard distributions. But for the current PostgreSQL and for more advanced use‐cases in the future, existing 3rd party tools for example, Slony‐I (Slony Development Group, 2008) and Pgpool‐II (Ishii et al., 2007)) already offer replication. PostgreSQL offers several languages for stored procedures: PL/pgSQL, PL/Tcl, and PL/Python. Further, it is possible to add new language support for example, Java is supported this way. The user can also create external functions in C libraries.

Many open source DBMSs are available and overall they have reached a high maturity, also with respect to commercial support etc. Some features offered by leading commercial DBMSs are, however, still missing from the open source DBMSs. For example, none of the considered DBMSs support materialised views and most of them do not support on‐disk bitmaps, star joins, and partitioning. Nevertheless, the open source DBMSs are usable for many production DWs and are being used for large BI projects in industry. Ingres Database, MySQL, and PostgreSQL are notable for their documentation, large user communities and rich feature sets.

3.7.7 On‐Line Analytical Processing ServersNot many open source on‐line analytical processing (OLAP) servers are available. For this survey, only two open source OLAP servers with running code were found. This might be due to the success of the first of them, Mondrian. This server is included in the leading open source BI packages and uses de‐facto standards and is a very popular choice for ROLAP usage.

3.7.8 Mondrian / Pentaho Analysis ServicesMondrian (Pentaho, 2008a) started as an independent open source project developing an OLAP server in 2002. In late 2005, Mondrian joined forces with Pentaho and is now being developed as part of Pentaho’s BI package. Mondrian can be downloaded and used without the rest of the Pentaho software. Mondrian is a relational OLAP (ROLAP) server. The most recent version of Mondrian is 3.0.3 from May 2008 (the 3.0 series was released in March 2008). It is released under the Common Public License (CPL).

As Mondrian is implemented in Java, it runs on many platforms and uses JDBC such that it can be used with most DBMSs. The documentation consists of HTML pages with relatively large contents close to 200 pages of text in printing. Further, active forums exist. Commercial support, consulting, and training are available from Pentaho and its partners. The Mondrian project is involved in the standardisation work for the olap4j specification which is intended to become a common API for OLAP servers a kind of JDBC for multidimensional data. The primary API to Mondrian is olap4j. Queries to Mondrian are expressed in the de‐facto standard in industry, MultiDimensional eXpressions (MDX) (Spofford, Harinath, Webb, Huang, & Civardi, 2008).

55

Cubes are specified by means of an XML file. GUIs for creating these XML files do exist, but they are still described as incubator projects and have version numbers starting with 0. With respect to scalability, it is stated in the FAQ for Mondrian, that large data sets can be handled if the underlying RDBMS can handle them, since all aggregation is delegated to the DBMS. Mondrian does have support for use of pre‐computed aggregate tables. Users are reporting that Mondrian performance is good even when handling hundreds of gigabytes data with hundreds of millions rows in industrial settings.

3.7.9 PaloPalo (Jedox, 2008b) is a multidimensional OLAP (MOLAP) server developed by Jedox AG. It is available under a commercial license or under the GPL. Windows and Linux are the primarily supported platforms. Commercial support and consulting is available from Jedox. A manual of around 376 pages exist, but costs €29.50. Active forums for Palo also exist. For this survey, version 2.0 was considered, but version 2.5 was released in early July 2008.

Palo loads its data set completely into memory and thus the memory on the host computer limits the supported data sets. Proprietary programming interfaces to communicate with Palo exist for Java, .NET, PHP, and C. There is also a free, but closed‐source add‐in for Microsoft Excel the manual for Palo even states that “Palo was developed for Microsoft Excel”. Using the plug‐in, Palospecific constructs like PALO.DATAC(…) can be used in formulas. The Excel add‐in can also be used to specify cubes.

Only two open source OLAP servers with running code were found. They are targeting different segments as Mondrian is a ROLAP server which can handle large data volumes while Palo is a memory‐based and thus also memory‐limited MOLAP server. The Mondrian OLAP server seems very popular as it is not only included in the developer company’s that is Pentaho’s BI suite, but also in other open source BI‐suites. Both of the OLAP servers are used in industry for BI projects.

3.8 On‐Line Analytical Processing ClientsCompared to the previous survey where only two open source OLAP clients were found, many more products are available at the time of this writing. From the previous survey, JPivot is still actively developed and parts of it are also used in other products.

3.8.1 FreeAnalysisFreeAnalysis (BPM Conseil, 2008) is developed by BPM Conseil from which commercial support, training, and consulting also can be bought. From the downloadable code, the license is not clear. Previously the MPL and a license derived from the MPL have been used. A (currently empty) project created for FreeAnalysis on Google Code also states that the license is MPL. Version 1.14 from June 2008 is considered here. FreeAnalysis is implemented in Java and can be used as a stand‐alone application or as a web application. A manual of 13 pages is available in French and another two page document with an example connection specification is also available. It is, however, said that a URL to a manual will be given to those who subscribe to FreeAnalysis’s mailing list.

FreeAnalysis works with Mondrian and servers that use XML for Analysis (XMLA) (Microsoft & Hyperion Solutions, 2002) and FreeAnalysis generates MDX queries. Reports in FreeAnalysis consist of pivot tables and graphs the JFreeChart package (Object Refinery Limited, 2008) is included in the distribution. FreeAnalysis does also support generation of cube definitions for the Mondrian OLAP server.

3.8.2 JPalo Client and JPalo Web ClientJPalo Client (Tensegrity Software, 2008) is a stand‐alone application while JPalo Web Client is an Ajax‐based web‐application. Both products are developed by Tensegrity Software and available under a commercial license or the GPL. The most recent version for both is 2.0 from June 2008. JPalo is here used to refer generically to JPalo Client and JPalo Web Client. As suggested by the name, JPalo works with the Palo server. It offers modelling and administration for the Palo server as well as tabular exploration of the data (in JPalo Client the data can also be represented in charts).


56

But JPalo can also be used with XMLA‐enabled sources. In any case, the user can explore the data in the GUI without typing queries manually. JPalo is implemented in Java and JPalo Web Client does not require any installation on the end‐users computer. During this survey, no manual for JPalo was found. Forums for JPalo exist and have some activity.

3.8.3 JMagallanes Olap & ReportsJMagallanes Olap & Reports is an open source component of Grupo Calipso’s JMagallanes suite (Grupo Calypso, 2006) where other components are closed source. The latest version is 1.0 from May 2006. The development company sells support and installation services. The documentation consists of videos and some rather short HTML documents. Forums do exist, but have had little activity lately. JMagallanes Olap & Reports is distributed under the BSD license.

The open source part of JMagallanes reads data from JDBC sources as well as XML and Microsoft Excel. Apart from static reports based on JasperReports, the user can explore the data in pivot tables and with charts based on JFreeCharts. Data can be exported to PDF, XML, Excel, CVS and HTML. It is possible to schedule reports and have them sent by email.

3.8.4 JPivotJPivot (Tonbeller, 2008), developed by Tonbeller, is among the first open source OLAP clients and parts of its code is also used in some of the more recent clients. JPivot is also included in BI suites like those from Pentaho and JasperSoft. JPivot is licensed under the CPL. The newest version of JPivot is version 1.8 from March 2008. It is a web‐application implemented in Java and JavaServer Pages (JSP) and thus the end user uses a normal web browser to explore the data.

Some HTML documentation exists for the JSP tags and for the Java source code. Apart from this, active forums exist for JPivot and for the BI suites that include it. JPivot was originally developed to be used with Mondrian, but it can now be used with other XMLAenabled servers as well (the JPivot project is also involved in the olap4j specification). JPivot generates MDX queries. The user explores data by means of pivot tables and graphs (the latter are based on JFreeChart) and can choose to enter MDX queries manually. Support for exporting data to Excel format and PDF is present.

3.8.5 JRubikJRubik (Introduction to JRubik, 2005) is an OLAP client which is based on JPivot components. JRubik is, however, a stand‐alone Java application while JPivot is a web‐application. It is licensed under the CPL the documentation is in HTML format and is relatively short. A forum with some activity exists. Version 0.9.4 was released in December 2006. That version only worked with Mondrian. In May 2008, a new version of JRubik using the new olap4j definition was released (with version number 0.0.0). In both of these versions, the tool generates MDX queries. The user explores data by means of pivot tables, charts (again from JFreeChart), or in a map component. Tabular data can be exported to PDF, XML, HTML, and Excel. Chart data can be exported to XML and HTML.

3.8.6 OpenIOpenI (OpenI.Org, 2008) was developed by the company Loyalty Matrix from which commercial support was also available. The company has now been acquired by another company and its technology is being integrated into a closed source application. It is therefore not known if the OpenI project will continue. The existing source code is available under a license similar to the MPL. The documentation is in HTML form and rather short. Forums exist and have been active. Lately the activity has been limited, though.

OpenI is implemented in Java and thus runs on different platforms. OpenI connects to XMLA sources and generates MDX queries. The user explores data in tables and charts which are based on components from JPivot and JFreeChart, respectively.

57

3.8.7 RexREX (SourceForge.net: REX, 2007) is an untraditional abbreviation of “warehouse explorer”. The latest release is 0.7 from November 2007. It is released under the LGPL. The documentation consists of a tutorial. Some forums do exist, but they are not very active with less than 60 messages in three years. No commercial support offerings for REX were found during this survey. REX is implemented in Java and runs on many different platforms. It works with XMLA sources and generates MDX queries. Data is browsed in pivot tables and charts using JPivot and JFreeChart components, respectively.

All the considered OLAP clients are implemented in Java. Six of the eight tools run directly on the client, while the two remaining run on a webserver. JPivot is widely spread since it is included in different BI suites. JPivot components are also used in four of the other OLAP clients. While many more clients are available compared to the previous survey, it is still the case that the OLAP client category leaves some room for improvements. The available documentation is limited and in some cases also the available support possibilities.

3.8.8 Integrated Business Intelligence SuitesIn this chapter we have discussed individual tools that can be used at the different layers in a full BI solution. This provides flexibility for a completely customised solution. Integrated open source BI packages do, however, also exist. In this section integrated packages are briefly described. JasperSoft Business Intelligence Suite JasperSoft Business Intelligence Suite (JasperSoft, 2008) from JasperSoft ships with MySQL and the web server Tomcat (The Apache Software Foundation, 2008) such that it can be used out‐of‐the‐box.

Further, it includes JasperServer for ad‐hoc queries, reports, charts, crosstabs and dashboards. It is also possible to schedule, share, and interact with reports using JasperServer. JasperSoft Business Intelligence Suite also includes JasperAnalysis for OLAP and JasperETL. These are based on Mondrian/JPivot and Talend, respectively. Finally, the JasperReports reporting tool is included in the suite.

3.8.9 Pentaho Open BI SuitePentaho Open BI Suite (Pentaho, 2008b) from Pentaho does not have a DBMS in the package. Preconfigured setups that use Firebird or MySQL do exist for easy testing, though. The suite includes Pentaho Data Integration, also known as Kettle. It also includes Pentaho Analysis with Mondrian and JPivot as well as Pentaho Dashboards. A reporting tool (based on JFreeReports which are now developed as part of the Pentaho suite) and the data mining tool Weka (Pentaho, 2008d) are also included.

3.8.10 SpagoBISpagoBI (Engineering Ingegneria Informatica, 2008) from OW2 Consortium is an integration platform. As it is an integration platform, and not a product platform, it is not made to use a certain tool set. Instead different “engines” (closed or open source) can be used, even at the same time. Thus, SpagoBI has drivers that integrate other tools into the platform such that, for example, Talend can be used as ETL tool and Mondrian as OLAP server in a SpagoBI project. SpagoBI’s behavioural model regulates visibility of data and documents. Also, administration tools for scheduling, configuration, etc. are included. A tool for designing and maintaining analytical documents and module for metadata management are also included in SpagoBI.

3.9 ConclusionMany more tools are available and their maturity has improved now a days. In the current survey, ten ETL tools, six DBMSs, two OLAP servers, and seven OLAP clients were considered. As in the previous survey, the DBMS category is the most mature. The DBMSs have many advanced features, and good commercial support and documentation are available. The ETL tool category has improved a lot compared to the previous survey. ETL tools with advanced features and GUIs now exist. The OLAP servers are still dominated by the ROLAP server Mondrian which has been developed further and now offers better and faster functionality. For the OLAP clients, there are also many tools available, but this category seems to lack a little behind and leave room for new “killer applications”.


58

While this survey considers many pre‐defined general criteria for the different categories, there are other factors to consider for the specific cases. Issues like stability and performance are also important for BI projects. As the tools were not configured and run in this survey, it was not possible to investigate these issues. Future work includes building two full BI solutions for the same purpose and data sets. One of them will be built using open source tools while the other will be built with commercially licensed tools. For such two solutions, it is interesting to investigate the possible differences in development time, ease‐of‐use, features and problems.

3.10 Adopting proprietary/standards-based Business IntelligenceThere are two downsides to the large enterprise BI vendors’ products. The first is cost. The second is complexity. On average, an enterprise BI product costs $1500 per user in the U.S. There are packages that reduce this cost for small deployments, but the price jumps up if we increase the size of the deployment. This inhibits broad use of the products in the organisation because the license cost doesn’t scale well to the size of the organisation.

The products are also very complex, for both operations and end users. Most of the enterprise BI products are suites with many components. To manage the environment and the models in the BI tool takes an administrator with a high degree of technical knowledge. The challenge for end users is the level of understanding needed to get information. The tools were designed during a period when usability was less important than query features. The vendors today are competing mostly on the number of features rather than whether people can use the tools, much like the spreadsheet wars of the 1980s when Microsoft and Lotus constantly added features until nobody could use their spreadsheets.

3.11 Using Open Source Business Intelligence ToolsThere are many upsides and Downsides of adopting proprietary/standards-based Business Intelligence (BI) also there are some advantages and disadvantages of using open source Business Intelligence tools. We will see these points below.

3.11.1 Downsides of Adopting Proprietary/Standards-Based Business Intelligence (BI)There are two downsides to the large enterprise BI vendors’ products. The first is cost. The second is complexity. On average, an enterprise BI product costs $1500 per user in the U.S. There are packages that reduce this cost for small deployments, but the price jumps up if we increase the size of the deployment. This inhibits broad use of the products in the organisation because the license cost doesn’t scale well to the size of the organisation.

The products are also very complex, for both operations and end users. Most of the enterprise BI products are suites with many components. To manage the environment and the models in the BI tool takes an administrator with a high degree of technical knowledge. The challenge for end users is the level of understanding needed to get information. The tools were designed during a period when usability was less important than query features. The vendors today are competing mostly on the number of features rather than whether people can use the tools, much like the spreadsheet wars of the 1980s when Microsoft and Lotus constantly added features until nobody could use their spreadsheets.

Open source BI tools do not have the same richness of features that the large BI tools have. There is a downside to this gap because many people have a need for those more advanced features. The good is that the open source products are less complex to configure and use for basic purposes.

Many people don’t replace their proprietary tools with open source. But they use open source BI tools for the more common uses because it meets the need at a lower cost. The other area we could see open source used more often is for embedding BI features into applications like customer self-service portals and call center systems. It is easier to integrate an open source product directly into an application than it is with a proprietary BI tool, as well as being cheaper when we consider the large user population of such an application.

59

3.11.2 Advantages of Using Open Source BI toolsTCO varies based on the size of the environment. There are some attractive pricing options from the big vendors available if we are deploying to a small number of users, generally less than 25. We could find that the small cost difference at this level can sometimes favours the proprietary vendors. However, an unsupported free open source version will still have the lowest TCO at this level. The medium to large deployment TCO favours open source tools because the per-user cost is much lower.

The labour side is not largely different between vendors of any type. Open source BI tools are on par to slightly better than Microsoft BI when it comes to labour costs. Both open source and Microsoft BI development environments require more labour than the big BI players. In general, this raises the initial development costs but doesn’t have as much of an effect on long-term labour costs. There are no significant differences in the size of BI staff between open source and proprietary implementations of the same scope.

3.11.3 Adoption of Open Source BI toolsThe open source vendors are still trying to make money like other software vendors. They will face increased scrutiny of their features and the difference in both cost and time to deliver a satisfactory solution. This is likely to slow the adoption rate. Open source is growing faster than the large vendors today, but at about the same rate as many of the new BI vendors in the market. It’s hard to measure the true adoption rate because only the paid support side of open source BI can be counted easily. The many users of the free versions of the software are largely uncounted.


60

SummaryThe use of Business Intelligence (BI) tools is popular in industry. However, the use of open source tools for BI • is still quite limited compared to other types of software.

There exist many different open source licenses, for example the GNU Public License and the Mozilla Public • License.

For an OLAP client, the survey considers which OLAP server(s) the client can be used with and which query • language it uses/generates.

Apatar (Apatar, 2008) is a data integration and ETL tool developed by the company also bearing the name • Apatar.

Apatar is ROLAP• ‐oriented and has direct support for a wide selection of relational DBMSs as well as generic JDBC support.

Clover.ETL (Clover.ETL, 2008) is developed by OpenSys and Javlin and is offered under either a GNU Library • General Public License (LGPL) or a commercial license.

Clover.ETL is a ROLAP tool and transfers structured data from different DBMSs and file formats.•

KETL (Kinetic Networks, 2008) not to be mistaken for Kettle described below is developed by Kinetic Networks • from which support can also purchased.

Kettle (Pentaho, 2008c) started as an independent open source ETL project, but was in 2006 acquired by Pentaho • to be included in the Pentaho BI suite (Pentaho, 2008b).

Kettle supports around 35 different DBMSs also generic JDBC and ODBC as well as a variety of flat files.•

Octopus (Together Teamlösungen, 2007) is an ETL tool from Enhydra.org with the LGPL license.•

Octopus is also ROLAP• ‐oriented and uses JDBC to connect to data sources and targets.

Palo ETL Server (Jedox, 2008a) is developed by Jedox AG from which commercial support and training options • are also available.

Pequel (Gaffiero, 2007) is the only considered ETL tool that is not written in Java.•

Scriptella (Kupolov, 2008) is an ETL and script execution tool. It is released under the Apache License.•

Talend Open Studio (Talend, 2008) is developed by Talend that also offers support, training and consulting.•

Firebird (Firebird Project, 2008) is based on the code base for the commercial DBMS.•

LucidDB (LucidDB, 2008) is developed by the software company LucidEra and the non• ‐profit organisation The Eigenbase Project.

MonetDB is developed by the research institute CWI and has a license similar to the MPL.•

There are two downsides to the large enterprise BI vendors’ products. The first is cost. The second is • complexity.

TCO varies based on the size of the environment.•

Open source BI tools do not have the same richness of features that the large BI tools have.•

61

ReferencesJanert, P., 2010. • Data Analysis with Open Source Tools, O’Reilly Media, Inc. Publication.

Clausen, N., 2009. • Open Source Business Intelligence, 2nd ed. BoD – Books on Demand Publication.

2011. • Opensource BI tool-Pentaho [Video Online] Available at: < http://www.youtube.com/watch?v=J5eVY3o_zGw> [Accessed 27 April 2012].

2009. • Business Intelligence Reporting & Analytics Tool: SQL Power Wabit, [Video Online] Available at: <http://www.youtube.com/watch?v=vkz1aC3rq9o&feature=results_main&playnext=1&list=PLA9A611916134446A> [Accessed 27 April 2012].

Thomsen, C. & Pedersen, T., 2008. • A Survey of Open Source Tools for Business Intelligence, [pdf] Available at: <http://vbn.aau.dk/ws/files/14833824/DBTR-23.pdf > [Accessed 27 April 2012].

EIAO, A., • Survey of Open Source Tools for Business Intelligence, [Online] Available at: <http://eiao.net/publications/EIAO_ddw06.pdf > [Accessed 27 April 2012].

Recommended ReadingChao, L., 2009. • Utilizing OpenSourceToolsforOnlineTeachingandLearning:ApplyingLinuxTechnologies, Idea Group Inc (IGI) Publication.

Wise, L., 2012. • Using Open Source Platforms for Business Intelligence, Elsevier Science Limited Publication.

2010• . Information Resources Management: Concepts, Methodologies, Tools and Applications, Idea Group Inc (IGI) Publication.


62

Self Assessment______is ROLAP1. ‐oriented and has direct support for a wide selection of relational DBMSs as well as generic JDBC support.

Clover.ETLa. Apatarb. CDDLc. ETL Integratord.

________ is a ROLAP tool and transfers structured data from different DBMSs and file formats.2. Clover.ETLa. Apatarb. CDDLc. ETL Integratord.

_______is based on the code base for the commercial DBMS InterBase version 6.0.3. Scriptellaa. Pequelb. Firebirdc. asperETLd.

Which of the following statements is true?4. KETL is MOLAP oriented.a. Kettle is released under the LGPL.b. ETL jobs are specified in a LGPL.c. Octopus is OLAPd. ‐oriented.

_________supports B5. ‐tree and bitmap indexes.MonetDBa. DBMSb. LucidDBc. UNIXd.

Databases larger than 4 __________are used in production environments.6. megabytesa. gigabytesb. kilobytesc. terabytesd.

The newest version of JPivot is version ______.7. 2.1a. 4.1b. 1.4c. 1.8d.

63

__________is an OLAP client which is based on JPivot components.8. JRubika. JPivotb. MDXc. OpenId.

_________connects to XMLA sources and generates MDX queries.9. JPivota. OpenIb. JRubikc. JFreeChartd.

OLAP clients are implemented in ________.10. javaa. .netb. sqlc. dbmsd.


64

Chapter IV

Business Analytics

Aim


explain business analytics and its need•

explicate governance framework•

explain applications of analytics key process•

Objectives


explain learning areas of business analytics•

explicate application categories of predictive modelling•

elucidate business intelligence implementation•

Learning outcome


understand future applications of business analytics•

identify business-driven methodology and project management•

describe business analytics and customer relationships•

65

4.1 Introduction to Business AnalyticsCorporations today are awash in information yet short on tools, methods, and talent for using it. Information about the most important facets of the business customers, processes, employees, competition is gathered but not analysed, reported but not understood, guessed about rather than acted upon. As a result, the status quo prevails and opportunities to improve performance, often dramatically, go unnoticed.

There are exceptions. Smart organisations have always tried to make the most of the information in hand. But recent technological breakthroughs have provided the ability to manage and make sense of vast amounts of hitherto unrelated data and in the process have redefined what it means to be a smart organisation. Aggressive competitors recognise these new capabilities and put them to work. They don’t just gather and report information they leverage it through business analytics:

• Capital One, which grows organically at 20% per year by analysing 60,000 product configuration experiments a year and follow through on the most promising.

Progressive, which analyzes specific market sub-segments to “skim the cream” and profitably insure customers • in traditionally high-risk categories.

Harrah’s Entertainment, which uses a customer loyalty program and predictive modelling techniques to identify • and retain the most profitable customers.

Marriott International, which has modelled its business and distributed analytic tools so that every property • can maximise revenue not only from hotel rooms and rates, but also from conference facilities, catering, and other services.

Procter & Gamble, which has drawn 100 analysts from across the enterprise to address the most complex cross-• functional issues, such as maximising growth in existing markets and optimising supply networks.

UPS, which expanded its statistical expertise in logistics and package tracking to anticipate and influence • customer actions and minimise attrition.

“Technological breakthroughs have redefined what it means to be a smart organisation.” These progressive companies have many things in common about how they operate:

They use sophisticated data-collection technology and analysis methods to wring every last drop of value from • their most strategic business processes.

They understand what motivates customers and makes them profitable.•

4.2 Where Should we Leverage Business Analytics?At first glance, it’s tempting to say, “all over the place.” Any part of a business can benefit from more systematic creation, gathering, and interpretation of information leading to better decisions and more informed actions. Indeed, as more and better information and analysis tools have become available, organisations are learning to “manage by fact” more consistently. So an analytical bent is good for a business generally.

However, business analytics as we’ve defined it must be focused. It’s neither practical nor cost-effective to apply these analytical techniques in every area of a business. Distinguish between applying more analytical techniques, which may be widespread, and competing on analytics, which centers on developing deep and distinguishing expertise in a specific and critical area of the business. In most cases, we will focus on business analytics where we already have distinctive capability; the aspect of the business where we excel relative to the competition and where we have chosen to compete. For Wal-Mart, the distinctive capability lies in an efficient supply chain. For automobile insurer Progressive, the most distinctive capability involves the pricing of risk. For the gaming industry leader Harrah’s, the chosen distinctive capability for the past several years has been customer loyalty, a departure from construction of the lavish casino and hotel facilities that some other firms have selected as the basis for competition. In all three companies, business analytics are largely focused on these distinctive capabilities.


66

Why now?There have always been companies that competed based on their superior ability to gather, analyze, and act upon the information at hand. But taking an analytical approach to the most important and complex business problems used to take an enormous amount of time, money, and effort. For example, the early practitioners of passenger yield management in the airline industry made extraordinary investments in people, process, and technology to master that business process.

Recent technological developments have lowered that threshold of investment dramatically.

Today’s information management technology at last enables dissimilar databases to “talk” with one another • and contribute their information to common repositories, and many corporations are investing in the integration and quality of their data.

Today’s sophisticated analytical tools include not only the established statistical regression and time series • methods, but also statistically-based “machine learning” techniques that partially automate the processes of pattern recognition and prediction.

With such technological capability now available, corporations have the means to address many of their most complex business problems and competitive opportunities areas that have defied systematic analysis in the past. And because they can, they will. Just as business reengineering leveraged the step-change in information technology of the 1980s to enable radical redesign of business processes, business analytics today leverages new technology to enable new processes and breakthrough process performance. Corporations that mastered the techniques of reengineering early enjoyed competitive advantage, as are those who are learning to compete on analytics today.

4.3 What’s the payoff?Most analytical competitors are leaders in their industries and quite successful in financial terms. In a study of 32 corporations with varying degrees of analytical orientation, an increased focus on analytics correlated with stronger financial performance. In another study of over 400 firms, analytical competitors were five times more likely to be in the top-performing quintile than the bottom-performing. The improvements in performance for analytical competitors are often dramatic. For example:

Harrah’s has improved in every measure of financial performance since it adopted an analytical orientation, • including revenues and profits, “same store” sales, average hotel room rates, slot machine profit margins, and perhaps most importantly, stock price.

Analysts suggest that Marriott reaps around an 8% revenue advantage compared to other similar hotel chains • from its highly refined and extensively-applied revenue management system.

Capital One has grown earnings per share by more than 20% each year since it went public in 1994, and has • grown to be the third-largest provider of credit cards in the U.S.

AutoZone creates an optimal portfolio mix of its product, pricing and promotional activities across all 3,300 • stores by understanding and forecasting the performance of individual departments, products and categories within each store.

4.4 Business Analytics and Customer RelationshipsMany corporations compete on the basis of their ability to initiate, expand, and maintain relationships with customers. Indeed, customer management is a high-potential domain for business analytics, especially the techniques of predictive modelling.

“Predictive modelling” uses a variety of analytical techniques to make estimates about the future based on current and historical data. These predictions are expressed as likelihood that a particular event, opportunity, or behaviour will take place. Predictive modelling can be used in making increasingly effective and individualised decisions about the treatment of customers. These models analyze the customers’ past performance in order to assess how likely a customer is to exhibit a specific behaviour or respond to a specific offer.

67

There are several mature predictive modelling applications. One is credit scoring, models that estimate the likelihood that a customer will make future payments on time. Another relatively mature application is in targeted marketing, which involves using consumers’ purchasing history and response rates, along with demographic, geographic and other relevant characteristics, to estimate the likelihood that customers will respond to particular marketing efforts.

Though this approach is mature, it still requires ongoing testing and refinement using various customer offers. But these mature applications represent just the tip of the iceberg of overall opportunity for using predictive modelling to optimise customer relationships. For starters, most of these mature approaches have been run off-line – periodic calculations that set customer transactions and marketing programs in motion. But now leading corporations are beginning to embed predictive models in real-time customer-facing processes and systems in ways that generate revenue opportunities and control risks.

We see five major application categories of predictive modelling for optimising customer relationships:

4.4.1 ValuationMany companies are beginning to actively measure and manage the asset value of their customer relationships. The first and most basic questions are: What’s the lifetime value of this customer? Based on a customer’s unique characteristics and transaction pattern, what types and magnitude of investment are justified? When we make an investment in this customer, does it generate transaction patterns that reflect an increase in value?

4.4.2 CustomisationCompetitive pressures are driving companies to personalise the way they manage customer relationships. Uniquely targeting consumers with the products, services, and experiences they value and are likely to respond to can lead to significant revenue growth while reducing acquisition costs. This goes well beyond traditional targeted marketing and cross-selling. Examples include the “recommendation engines” at Amazon and NetFlix, and the “intelligent wardrobe recommendations” made by call center agents at Victoria’s Secret Direct. One financial services company predicts a unique “next logical sale” to offer to each customer who calls for any service issue or inquiry.

4.4.3 PricingMany businesses have to account for unique customer risk, and then price the product or service based on the cost of covering that risk. For example, auto insurance providers must accurately determine the amount of premium to charge to cover each automobile and driver. More effective predictive modelling can streamline the process of customer acquisition by predicting the risks of a particular customer and making more effective pricing decisions. Even retailers are beginning to broadly adopt analytical approaches to pricing their goods, and online retailers are experimenting with offering different pricing to different customers for the same product or service.

4.4.4 RetentionToo many businesses try to retain customers only after the customer attempts to terminate the relationship. At this stage, changing the customer’s mind can be expensive or impossible. Many businesses also face “silent attrition,” where customers slowly but steadily reduce purchases and usage. Some corporations are developing early warning systems that detect any significant change in customer behaviour that may indicate a service or retention issue. They then take pre-emptive measures to retain customers and address any latent service issues.

4.4.5 Fraud DetectionFraud ranges from inaccurate credit applications, fraudulent transactions, and false insurance claims, to identity theft. It undermines the profitability of companies and drives up the costs of goods and services for customers. Property and casualty insurance fraud amounts to approximately $30 billion a year, health care fraud is approaching $100 billion, and credit card fraud is estimated to cost $1-2 billion. Finding fraud has always been a needle-in-a-haystack problem, but increasingly effective predictive models are being used to quickly identify fraudulent activity without increasing the number of “false positives” that can inconvenience and alarm customers.


68

4.5 What Information and Technology do we Need?For starters, we must have high-quality, integrated data about the aspects of the business requiring analysis. The needed data may come from internal transaction systems, point-of-sale or other customer-facing systems, Web-based systems, or sources external to the corporation. Most companies today do not lack for sufficient amounts of data, but many still suffer from a lack of integration can the information be used together? And a lack of quality how well does the data measure and represent the business phenomena that we want to analyze? Without good data, we simply can’t do good analytics.

Specific technologies are required for analytical competition, but most large organisations probably have many of them already. Assuming that critical data reside in transactional systems such as an enterprise resource planning (ERP) system, it must be extracted with the use of “extract, transform, and load” software. It should generally be transferred into a data warehouse for easy access. If databases are particularly large, it may be necessary to store them in a “data warehouse appliance” specialised hardware and software for data access and retrieval. All firms interested in analytical competition will need specialised software for query and reporting, and for sophisticated analysis. Leading vendors are increasingly combining all “business intelligence” capabilities in an integrated platform, but most are still typically stronger at either query or reporting, or at varying types of analytics.

4.6 What Kinds of people do we need?We’ve got to have trained and skilled analytical people to do analytical work and to succeed in analytical competition. There are three levels of analytical people to consider:

Analytical professionals. Most successful analytical competitors have a core cadre of people who design and • conduct experiments and tests, define and refine analytical algorithms, and perform data mining and statistical analyses on key data. In most cases, these individuals have advanced degrees often Ph.D.s in such analytical fields as statistics, operations research, logistics, economics, or econometrics.

Analytical semi-professionals. They can do substantial amounts of modelling and analysis using spreadsheets • or visual analysis tools, but are unlikely to develop sophisticated new algorithms or models. These individuals might be, for example, quantitatively-oriented MBAs with deep knowledge and experience in the business process or function that’s the analytical focus of the enterprise.

4.7 What Roles Must Senior Executives Play?Another prerequisite for analytical competition is committed senior executives who provide the passion and the resources to drive their organisations in an analytical direction. In virtually every firm that has determined to leverage analytics, the CEO and senior management team set an analytical strategy in the first place and then continually pushed it forward.

There are three key roles that the CEO or business unit head must play:

Strategy. Decide where analytics should be leveraged in the business. As discussed under the first question, the • CEO must articulate the business’s distinctive capability and chosen basis of competition, determine where in that domain to leverage the power of analytics, and charter the first (or next) analytical initiative.

Capability. Drive with passion and commitment the organisational changes needed by an analytical competitor. • Without top executive support, any company is unlikely to make the needed changes in skills, information management processes, and IT capabilities.

Execution. Insist that the business take action based on its analyses. It’s often easier, for example, to create a • segmentation scheme for customers than to actually treat customers differently. And it’s easier to establish the profitability of products than to discontinue unprofitable ones. Managers of the functions involved in analytical projects must be prepared to take action with the insistence and backing of top management.

69

4.8 More Applications of Analytics Key Process: Supply ChainWhether the corporation produces physical products or provides services, supply chain is among the most critical business processes. It’s how the “goods” get delivered, whatever they may be. And it’s where the most basic optimisation of cost-to-performance takes place. Business analytics can reveal the below-the-surface truths about supply chain performance:

What’s the weakest point in our chain? What handoffs pose the most potentially damaging disruptions?•

What’s the strongest point in our chain? Where might we be over-paying and over-performing? How much • should we really be outsourcing or relying on others? What’s the best risk/reward-adjusted mix of inside and outside resources?

What measures most accurately predict overall supply chain performance? What measures is merely window • dressing or, worse, misleading?

4.9 Key Asset: PeopleThe labour force growth rate is slowing, and labour markets are projected to continue tightening. Employee variety is increasing by every measure demographics, background, lifestyle, motivation. Corporations must understand their employees and prospective hires more deeply than ever as a foundation for customising the “employment deal” as never before. Business analytics can show the way:

What’s our company’s exposure to retirement waves, heightened competition for talent, and educational shortfalls • among job candidates?

What are our most critical employee categories? Where do departures hurt the most? What facets of the job, the • organisation, and the employment deal matter most to our most important employees?

How many options should we offer employees in compensation and benefits packages? Where does customisation • pay off, and where is it a waste of effort?

4.10 Watershed Event: Merger or AcquisitionThe track record of corporate mergers and acquisitions in the U.S. is pretty abysmal. Few meet their stated goals, and most end up destroying shareholder value. The financial and ownership transaction is simple compared to the complexity of integrating operations, leveraging scale, and cutting costs. And yet the rate of corporate combinations has again increased. If your organisation is contemplating a merger or acquisition, or in the midst of one, business analytics can help clear a path to success:

What effects will the combination have on our markets? How are customers, especially common ones, most • likely to react?

What’s the real scope of the integration effort? How ready are the organisations, their assets, and their information • for integration?

What’s our most aggressive, risk/reward-adjusted pace for integrating?•

What does past M&A experience of the two entities and of our industry tell us about the chances and methods • of success?

4.11 Applications, Trends, and StrategiesThe Enterprise Resource Planning (ERP) concept is a growing one. As we like to say, “ERP is a journey, not a destination”. ERP revolutionised the enterprise applications market in the 90s, but no one could envisage the subsequent development. The new millennium dawn brought a new generation of enterprise applications, intended to be more customer-focused and to extend beyond the enterprise through ecommerce interaction and collaboration with business partners. The key to the Internet driven, dynamic trade environment is agility.


70

Thus, early ERP adopters discovered that implementing these systems was only the first step toward creating a competitive IT infrastructure. They and new users alike are now looking for significantly more comprehensive functionality from advanced planning and scheduling (APS) and manufacturing execution systems (MES), to sales force automation (SFA) and customer relationship management (CRM), to business intelligence (BI) and different e-business tools to name only some and demanding that they be integrated into their ERP system.

The new generation of enterprise applications goes beyond traditional transactional business functions by enabling organisations to deliver real-time performance analysis directly on the desktops of all business managers, for they can become more knowledgeable and proactive.

4.12 Business Intelligence: Another Step in the ERP EvolutionAt first, organisations realised that to maximise the value of the information stored in their ERP systems, it was necessary to extend the ERP architectures to include more advanced reporting, analytical and decision support capabilities. While relational databases, presently used by ERP systems, are capable of retrieving a small number of records in short time, they are not good at retrieving a large number of records and summarising them on demand. Most ERP products have a valuable database, but, translating the data stored to information useful for decision making process has proven difficult. With the availability of analytic solutions, several dozens of ERP providers can offer their customers a valuable tool for harvesting the business value out of their database.

Thus, major ERP vendors have been increasingly embracing OLAP (On-line Analytical Processing) tools that provide a high-level aggregated view of data. Various analytics and business intelligence solutions enable organisations to track, understand, and manage enterprise performance; they leverage the information that is stored in corporate databases or data warehouses, legacy systems, and other enterprise applications.

Contrary to traditional core ERP, business intelligence and analytics provide an environment in which business users receive information that is reliable, consistent, understandable and easily manipulated. Because executives and middle management have always had a need to understand their business’ performance regardless of good or bad economic times while the output from BI might change, the need is always there. The classical three level business intelligence pyramid shows the instruments frequently used by managers in different echelons.

Fig. 4.1 Business intelligence tools(Source: http://anale.feaa.uaic.ro/anale/resurse/46_Hurbean_L_-_Business_intelligence-applications,_trends_

and_strategies.pdf)

71

Given that the BI tools have neither been terribly complex nor expensive to deploy, but have still been helpful in easing the decision-making process, in the recent years they have become considered necessary rather than as a luxury. The latest evolutionary step introduces the concept of corporate performance management (CPM), also referred to as enterprise performance management (EPM) or business performance management (BPM), which is an emerging portfolio of applications and methodology with business intelligence architectures and technologies at its core.

Historically, BI applications have focused on measuring sales, profit, quality, costs and many other indicators within an enterprise, but CPM goes well beyond these by introducing the concepts of management and feedback, by embracing processes such as planning and forecasting as core tenets of a business strategy.

So, CPM is the evolutionary combination of technology and business philosophy, built on the basis of BI technology and applications that many enterprises have already implemented, study the figure below. The demand for these applications lies in the fact that they add value to previously installed enterprise applications, to a degree that the enterprise may finally see some long delayed benefits and feel better about implementing ERP systems and BI solutions.

Fig. 4.2 Business intelligence tools and technologies

(Source: http://anale.feaa.uaic.ro/anale/resurse/46_Hurbean_L_-_Business_intelligence-applications,_trends_and_strategies.pdf)

CPM crosses traditional department boundaries to manage the full life cycle of business decision-making. It involves mapping a structured set of data against predefined reports, alerts, dashboards, analysis tools, key performance indicators (KPIs), etc., to monitor and improve business processes based on the upfront established corporate strategic objectives. Further, CPM creates a closed-loop process, starting with developing high-level corporate goals and subsequent predefined KPIs, through measuring actual results against the KPIs and representing this comparison in a scorecard, with the results reported to management through intuitive reporting tools, and ultimately delivering these results back into the business modelling process for corrections in the next planning cycle.

CPM augments BI applications, traditionally focused on measurement mostly useless without the ability to act on it! CPM represents a renewed focus on quantitative management, a “management by numbers” method using insight gained from data analysis and performance reporting.


72

4.13 Business Intelligence Benefits and ChallengesWhole pages can be written to answer the first question but a compressed answer shows that BI allows businesses:

To leverage their information assets as a competitive advantage•

To better understand the demand side of the business and manage customer relationships, and to monitor results • of change both positive and negative

There are thousands of articles, white papers, books, vendors, and consultants dedicated to Business Intelligence. BI is listed among the top 10 technologies in 2005. Because businesses today are moving at the speed of information as analysts say. And because the economy demands IT companies are struggling to survive the economic storm, by reducing costs or increasing revenues.

BI solutions enable decision-makers to launch queries against the various data sets that are captured during the course of everyday business transactions: financial transactions, customer records, inventories, sales, production, etc. By analyzing these data in various ways, managers can discover trends, compare results, spot anomalies, and experiment with “what if” scenarios. BI has thus become the quintessential enabling technology for effective business management.

Howard Dresner, father of the Business Intelligence concept, explains why enterprises must make business intelligence an imperative: “Doing business is information-intensive. Enterprises are being pushed to share information with increasingly more audiences. The business intelligence imperative insists we elevate BI to a strategic initiative now, or risk disaster!” He stresses the immense risk of not knowing and the worst of incomplete information compared to no information. BI attempts to eliminate guessing and ignorance in enterprises by leveraging the huge volume of quantitative data gathered every day in a variety of corporate applications.

Nowadays, popular uses of BI include management dashboards and scorecards, collaborative applications, workflow, analytics, enterprise reporting, financial reporting, and both customer and supplier extranets. These solutions enable companies to gain visibility into their business, acquire and retain profitable customers, reduce costs, detect patterns, optimise the supply chain, analyze product portfolio, increase productivity and improve financial performance.

4.14 Some Aspects about Business Intelligence ImplementationAn enterprise should go through two phases before it is ready for BI. The first phase refers to the implementation of a solid transaction system; an integrated system like ERP, if possible. Its major aim is to create an OLTP system for collecting, storing, and updating transactional data in relational databases. Initially, these systems were reduced to the finance and accounting area the market offered lots of financial accounting applications. Then, other types of applications were added: sales, procurement, inventory management, human resources management, and so on.

Sometimes companies used different solutions from different vendors, or they combined it with domestic software, so they faced the problem of multiple data sources. The second phase is meant to solve the problem of asynchronous master data. The upgrade to integrated systems became crucial as the number of modules significantly increased.

There are two types of BI implementations:

Implementing BI applications with the standard functionality, for creating simple, multidimensional OLAP • cubes. It often reduces to an OLAP module which includes a set of predefined multidimensional models for analysis of different types of data. The data are available in the OLTP system. Oracle, Microsoft, or Business Objects (recently entered Romania) offer the most popular tools for creating OLAP modules.

Development of more sophisticated analytical models, in order to reflect the unique mix of the company’s targets • and factors of influence. Developing a “home-grown solution” is a good choice when functional criteria are not met by standard applications and there is a wish for total integration within the existing planning and control and/or knowledge management procedures, methodologies, and tools within the enterprise.

73

Recommendations for BI applications should be based on a company’s functional requirements, budget, technical architecture, and overall user need. Selecting and implementing the right BI is a challenging job. Implementing BI is a costly and time consuming venture. If the wrong BI is implemented without good research and planning it could be a failure initiative. One very important point to be considered for selecting BI is there should be a close match between company’s requirements and vendors provided solutions.

Today’s organisations are relying on Business Intelligence applications to provide them with hard facts that help them make better, more informed decisions to obtain unforeseen rewards. We can conclude that their success in business depend on the implementation of BI systems. The vast amounts of data, growing at 30-50 percent a year, the increasing burden of government regulation and compliance obligations, competition and customer demands are focusing attention on the need for timely and often real-time information, and in plenty of detail. The issues that should be addressed in order to successfully integrate BI into the enterprise are:

Prepare a solid foundation with the ERP system in the core, so as to take care of the purity of the data sources • (data quality).

Identify where, who, and when BI is needed in harmony with the established business objectives, in an attempt • to prevent the “shelf ware” unused or underused software, sitting around in enterprise.

Keep in mind the necessity for set up a common BI platform of integrated tools, with the intention of avoiding • the “BI islands” different applications that can’t communicate with each other.

Integrate BI into enterprise portals or keep this option open for the near future, as users have different roles • across the enterprise, and use different applications they should have the BI they needed, integrated to fit their job function.

Last, but not least, large enterprises should tackle BI strategically, because they have valuable data that can tell them about performance, customer behaviour, process efficiency, and important trends. There are a lot of companies with a tactical view of BI: specific BI/analytical applications implemented in some departments or as part of some other application. For many of them it is difficult to implement BI strategically, as this approach forces the enterprise to reflect upon it and how it actually work. Another reason that can be mentioned here is the cultural profile of the organisations the resistance to change and the fear of what they might learn are serious challenges for BI strategic projects.

Final conclusion: businesses can create intelligence from their data and provide timely, accurate access to their end users. Business Intelligence is the latest buzzword working its way through the business and technology worlds. Much like ERP, SFA, and CRM before it, the hype is now shifting toward Business Intelligence.

4.15 Managing the Implementation of Business Intelligence SystemsThe implementation of a BI system is a complex undertaking requiring considerable resources. Yet there is a limited authoritative set of CSFs for management reference. This article represents a first step of filling in the research gap. The authors utilised the Delphi method to conduct three rounds of studies with BI system experts in the domain of engineering asset management organisations. The study develops a CSFs framework that consists of seven factors and associated contextual elements crucial for BI systems implementation.

The CSFs are committed management support and sponsorship, business user-oriented change management, clear business vision and well-established case, business-driven methodology and project management, business-centric championship and balanced project team composition, strategic and extensible technical framework, and sustainable data quality and governance framework. This CSFs framework allows BI stakeholders to holistically understand the critical factors that influence implementation success of BI systems.


74

4.16 Background of Implementation of Business Intelligence Systems Background consists of engineering asset management organisations (EAMOs) such as utilities and transportation enterprises, store vast amounts of asset-oriented data. However, the data and information environments in these organisations are typically fragmented and characterised by disparate operational, transactional and legacy systems spread across multiple platforms and diverse structures.

An ever-increasing amount of such data is often collected for immediate use in assessing the operational health of an asset, and then it is either archived or deleted. This lack of vertical integration of information systems, together with the pools of data spread across the enterprise, make it extremely difficult for management to facilitate better learning and make well-informed decisions thus resulting in suboptimal management performance. Yet large volumes of disperse transactional data lead to increased difficulties in analyzing, summarising and extracting reliable information. Meanwhile, increased regulatory compliance and governance requirements have demanded greater accountability for decision making within such organisations.

In response to these problems, many EAMOs are compelled to improve their business execution and management decision support through the implementation of a BI system. According to Negash (2004), “BI systems combine data gathering, data storage, and knowledge management with analytical tools to present complex and competitive information to planners and decision makers.” Implicit in this definition, the primary objective of BI systems is to improve the timeliness and quality of the input to the decision making process.

Data is treated as a corporate resource, and transformed from quantity to quality. Hence, actionable information could be delivered at the right time, at the right location, and in the right form (Negash, 2004) to assist individual decision makers, groups, departments, divisions or even larger units. Fisher et al. (2006) further posited that a BI system is primarily composed of a set of three complementary data management technologies, namely data warehousing, online analytical processing (OLAP), and data mining tools.

A successful implementation of BI system provides these organisations with a new and unified insight across its entire engineering asset management functions. The resulting unified layer, in reporting, business analysis, and forecasting assures consistency and flexibility. Critical information from many different sources of an asset management enterprise can be integrated into a coherent body for strategic planning and effective allocation of assets and resources.

Hence, the various business functions and activities are analyzed collectively to generate more comprehensive information in support of management’s decision-making process. BI systems come as standardised software packages from such vendors as Business Objects, Cognos, SAS Institute, Microstrategy, Oracle, Microsoft and Actuate, and they allow customers to adapt them to their specific requirements.

In recent years, the BI market has experienced extremely high growth as vendors continue to report substantial profits. Forrester’s recent survey indicated that for most CIOs, BI was the most important application to be purchased. The results of the latest Merrill Lynch survey into CIO spending similarly found that the area with the top spending priority was BI (White, 2006). These findings are echoed by Gartner’s CIOs priorities surveys in 2006 which revealed that BI ranked highest in technology priority (Gartner, 2006b). In the most recent survey of 1400 CIOs, Gartner likewise found that BI leads the list of the top ten technology priorities.

4.17 Introduction and Research MotivationWhile BI market appears vibrant, nevertheless the implementation of a BI system is a financially large and complex undertaking. The implementation of an enterprise wide information system (such as a BI system) is a major event and is likely to cause organisational perturbations (Ang & Teo, 2000). This is even more so in the case of a BI system because the implementation of a BI system is significantly different from a traditional operational system. It is an infrastructure project, which is defined as a set of shared, tangible IT resources that provide a foundation to enable present and future business applications (Duncan, 1995). It entails a complex array of software and hardware components with highly specialised capabilities (Watson & Haley, 1998).

75

BI project team need to address issues foreign to the operational systems implementation, including cross-functional needs, poor data quality derived from source systems that can often go unnoticed until cross-systems analysis is conducted; technical complexities such as multidimensional data modelling; organisational politics, and broader enterprise integration and consistency challenges (Shin, 2003). Consequently, it requires considerable resources and involves various stakeholders over several months to initially develop and possibly years to become fully enterprise-wide (Watson & Haley, 1997).

Typical expenditure on these systems includes all BI infrastructure, packaged software, licenses, training and entire implementation costs, may demand a seven digit expenditure. The complexity of BI systems is exemplified by Gartner’s recent study that predicted more than half of systems that had been implemented will be facing only limited acceptance (Friedman, 2005).

Much IS literature suggests that various factors play pivotal roles in the implementation of an information system. However, despite the increasing interest in, and importance of, BI systems, there has been little empirical research about the critical success factors (CSFs) impacting the implementation of such systems. The gap in the literature is reflected in the low level of contributions to international conferences and journals. Although there has been a plethora of BI system studies from the IT industry, nonetheless, most rely on anecdotal reports or quotations based on hearsay.

This is because the study of BI systems is a relatively new area that has primarily been driven by the IT industry and vendors, and thus there is limited rigorous and systematic research into identifying the CSFs of BI system implementation. Therefore, the increased rate of adoption of BI systems, the complexities of implementing a BI system, and their far-reaching business implications justify a more focused look at the distinctive CSFs required for implementing BI systems.

4.18 Research ObjectiveGiven the background and motivation of this research, the authors used Delphi method to:

Explore and identify the CSFs, and their associated contextual elements that influence implementation of BI • systems.

Consolidate a CSFs framework for BI system implementation.•

Essentially, the authors argue that there is a set of factors influencing the implementation of BI systems and such antecedents that is CSFs are necessary. In alignment with Sum et al.’s (1997) argument, this research also recognises that the associated contextual elements that make up each factor provide more specific, useful and meaningful guidelines for BI systems implementation. As asserted by Sum et al. (1997), Top management support has often been cited as a CSF, but what exactly constitutes top management support is not really known.

Good performance of the CSFs requires that their elements (or constituents) be known so that management can formulate appropriate policies and strategies to ensure that the elements are constantly and carefully being managed and monitored. Lack of clear definitions of the CSFs may result in misdirected efforts and resources. Furthermore, the CSFs identified can be consolidated into a framework to provide a comprehensive picture for BI stakeholders, and hence allowing them to optimise their resources and efforts on those critical factors that are most likely to have an impact on the system implementation. Thereby ensuring that the initiatives result in optimal business benefits as well as maintaining effective uptake.

4.19 Research MethodologyIn the absence of much useful literature on BI system, this study seeks to explore and identify a set of CSFs that are jointly agreed by a group of BI system experts who possess substantial experience in EAMOs. The Delphi method was deemed to be the most appropriate method for this study because it allows the gathering of subjective judgments which are moderated through group consensus. Moreover, this research assumes that expert opinion can be of significant value in situations where knowledge or theory is incomplete, as in the case of BI systems implementation in EAMOs.


76

Unlike focus group method, this Delphi method is particularly suitable for this research situation where personal contact among participants and thus possible dominance of opinion-leaders is not desirable because of concerns about the difficulty of ensuring democratic participation. For this study, a Delphi panel composed of fifteen BI systems experts in EAMOs was established. Ziglio (1996) asserts that useful results can be obtained from small group of 10-15 experts.

Beyond this number, further increases in understandings are small and not worth the cost or the time spent in additional interviewing. Thus, the size of such a Delphi panel is deemed suitably representative. It is shown in the table below. The Delphi participants have all been substantially involved in the implementation of BI systems within EAMOs in Australia and the United States.

In addition, the range of engineering asset management organisations represented by these experts was diverse and included public utilities such as electricity, gas, water, and waste management and infrastructure-intensive enterprises such as telecommunications and rail companies. It should be noted that some of the large organisations in which the participants have been involved have implemented BI projects in a series of phases. Most of the EAMOs are very large companies with engineering assets worth hundreds millions of dollars and have committed immense expenditure to BI projects. So the expertise of the Delphi participants represents ‘state of the art’ knowledge of BI systems implementation in a broad range of engineering asset-intensive industries.

The Delphi study comprised three rounds. During the first round the authors conducted face-to-face interviews with each participant and phone interviews in some cases due to geographical constraints, and these varied in duration from one to one and half hours. Rather than having an open-ended question, the authors adopted a different approach from traditional

Delphi methods by beginning with a list of factors derived from data warehousing literature, which is the core component of a BI system. Having a prior theory has advantages such as allowing the opening and probe questions to be more direct and effective, and helping the researcher recognise when something important has been said. However, the existing literature is not comprehensive in regard to CSFs for an entire BI system, but mainly focuses on data warehousing. Therefore, those factors were mainly used to start each discussion. When the mention of particular factors elicited relevant responses then further probing questions would follow in order to gather more details on those factors. The pane lists were indeed encouraged to suggest other factors that they deemed critical.

At the commencement of the interviews, it was explained that the study focused on CSFs that facilitated the implementation success of BI systems in terms of infrastructure performance and process performance. The infrastructure performance consists of three major IS success dimensions proposed by Delone and McLean (1992; 2003), namely system quality, information quality, and system use, whereas process performance is composed of meeting time-schedule and budgetary constraints (Ariyachandra & Watson, 2006).

After the interview, further clarifications (if any) were made by follow-up phone calls and e-mail communications. Subsequently, the data gathered from the first round of interviews were analyzed thoroughly by content analysis technique, a constant comparison (‘grounded’) technique, to identify major themes. This technique encourages the emergence of a finding from the data set by constantly comparing incidents of codes with each other and then abstracting related codes to a higher conceptual level.

77

Current Position Organisation Type BI System EAMOs’ Industry Sector

Principal consultant, Committee, Author,

Speaker

BI Consultancy, TDWI Committee

Business Objects, Information Builder,

Cognos, Oracle

Electricity, gas, water & waste utilities, oil &

gas production, defense, public transportation

Principal consultant, Committee

BI Consultancy, DWAA Committee

Cognos, Business Objects, Actuate

Telecommunications, airlines, municipal utility

Principal consultant, Author, Speaker

BI Consultancy, TDWI Summit

Cognos, Business Objects, Hyperion,

Oracle, SAS

Energy utilities, transportation, mining

industries

Principal consultant, Committee

BI Consultancy, DWAA Committee

Actuate, Microstrategy,

Business Objects

Transportation & municipal utility,

logistics

Principal consultant, Author, Speaker

BI Consultancy, TDWI Summit

Hyperion, Informatica, Oracle, Actuate, Business Objects

Electricity, gas, water utilities,

telecommunications

Principal consultant BI Consultancy Business Objects, Cognos, Oracle

Electricity, gas, water & waste utilities

Principal consultant BI Consultancy SAS, Business Objects, Cognos, Microsoft, Oracle,

Informatica

Rail infrastructure and fleets, public

transportation, mining industries

Principal consultant BI Consultancy Oracle, IBM, Hyperion, Informatica,

Cognos, Microsoft

Telecommunications, electricity, gas, water

utilities,

Executive VP (global consulting), Speaker

BI Consultancy, Conferences

Hyperion, Informatica, Oracle

Utilities, telecommunications, public transportation

Principal consultant BI Consultancy Oracle, Business objects

Energy utilities, logistic transportation company

Principal consultant BI Consultancy Informatica, Oracle, Hyperion

Rail infrastructure and fleets

Principal consultant BI Consultancy Cognos, SPF Plus Energy utilities


78

Principal consultant BI Consultancy Business Objects, SAS, Oracle

Utilities & logistics

Academic, Consultant, Author, Speaker

Academia, BI Consultancy

Oracle, Business Objects, Hyperion

Microstrategy

Utilities, telecommunications &

manufacturing

Principal consultant BI Consultancy Oracle, IBM Municipal utilities

Table 4.1. Delphi participants and their BI systems experience in EAMOs

In other words, the qualitative data were examined thematically and emergent themes were ranked by their frequency and later categorised. The objective of the present research was to identify the CSFs that influence the implementation of BI systems. Hence, it is considered to be very important to determine what emerges from the data regarding interpretations of the CSFs for implementing BI systems. In the subsequent round, the suggested factors of all the participants were consolidated into a single list. The list was then distributed among the participants to facilitate comparison of the expert’s perceptual differences. However, none of them nominated any additional factors of their own. Also, based on feedback from participants, some further minor changes were incorporated.

In addition, the participants confirmed that the classification of factors and their associated contextual elements is appropriate. For instance, several elements are grouped together because of the closed interrelationship. During the third round, the list of candidate CSFs was surveyed by the Delphi participants using a structured questionnaire survey approach. Specifically, a 5-point Likert scale was applied to rate the importance of the candidate CSFs in the process of seeking statistical consensus from the BI experts.

The purpose of using a 5-point scale from 1 to 5 (where 1 meant ‘not important,’ 2 of ‘little importance,’ 3 ‘important,’ 4 ‘very important,’ to 5 ‘critically important’) was to distinguish important factors from critical success factors. From the survey feedback, only those factors with average rating of 3.5 and above were shortlisted as CSFs as shown in Table below. These CSFs ratings are considered legitimate because the participants were directly drawing on their hands-on experience in EAMOs’ BI system implementations. The details of the results are discussed below.

4.20 CSFS Finding and DiscussionThe table below depicts the average rating results for the respective CSFs in descending order of importance. It contains the consensus outcomes and shows that the Delphi study captured the importance of the seven critical factors, namely committed management support and sponsorship, business user-oriented change management, clear business vision and well-established case, business-driven methodology and project management, business-centric championship and balanced project team composition, strategic and extensible technical framework, sustainable data quality and governance framework.

Notably, data and technical-related factors did not appear to be the most critical in relation to other organisational factors. According to most interviewees, technological difficulties can be solved by technical solutions. However, it was found that achieving management and organisational commitment for a BI initiative poses the greatest challenge, because the BI teams considered them to be outside their direct control. The organisational support is reflected in the attitudes of the various business stakeholders; that is, their attitudes to change, time, cost, technology, and project scope. Based on a large-scale survey result, Watson and Haley (1997) pointed out that the most critical factors for successful implementations were organisational in nature.

Committed management support and adequate resources were found to determine the implementation success, because these factors worked to overcome socio-political resistance, address change-management issues, and increase organisational buy-in. This finding was also converging with Gartner’s recent observation that “overcoming complex organisational dynamics will become the most significant challenge to the success of business intelligence initiatives and implementations”.

79

Critical Success Factors Mean Std. DevCommitted management support

and sponsorship4.16 0.99

Business user-oriented change management

4.10 1.00

Clear business vision and well-established case

4.09 0.90

Business-driven methodology and project management

4.08 0.88

Business-centric championship and balanced project team

composition

3.94 0.89

Strategic and extensible technical framework

3.90 0.89

Sustainable data quality and governance framework

3.82 0.91

Table 4.2 Ratings of critical success factors by Delphi participants

In fact, the effort of implementing BI systems is highly regarded by the Delphi participants as a business-driven program as opposed to a technological one. The fulcrum of BI program success is thus dependent on the business personnel, whereas technical people are expected to support the analytical requirements via technologies and tools. The definition of strategic BI framework, project scoping and data quality initiatives were considered within the realm of business personnel. That is, this new understanding emphasises the priority of business aspects, not the technological ones, in implementing BI systems.

While the specific CSFs may seem to vary slightly between BI systems and general IS studies, the actual contextual elements of these CSFs are substantially different from the implementation effort required for conventional operational systems. Unlike those transactional systems, business stakeholders need to be involved interactively in order to meet their dynamic reporting and ever-changing analytical needs. Owing to the evolutionary information requirements, the BI team has to provide continual support not only on tools application, but also at broader data modelling and system scalability issues.

This is in line with the adoption of an incremental delivery approach for implementing an adaptive decision support system, such as a BI system. Moreover, organisational and business commitment to a BI system implementation is critical to solve the complex organisational issues, especially in the democratisation process of data ownership, selection of funding model, change of business process, definition of the scoping study, data stewardship and quality control, and the provision of domain expertise and championship. The following section presents the CSFs framework consolidated from these CSFs findings.


80

4.21 Development of a Critical Success Factors FrameworkBased on the research finding, these seven critical factors were integrated with the implementation success measures to provide a comprehensive CSFs framework for implementing BI systems. As illustrated in figure below, this CSF framework outlines how a set of factors contribute to the success of a BI system implementation. It postulates that there is a set of CSFs influencing the implementation success that takes into account two key measures: infrastructure performance and process performance. The infrastructure performance has parallels with the three major IS success variables described by (Delone & McLean, 1992; 2003), namely system quality, information quality, and system use, whereas process performance can be assessed in terms of time-schedule and budgetary considerations.

Specifically, system quality is concerned with the performance characteristics of the information processing system itself, which includes ease-of-use, functionality, reliability, flexibility, integration, and response time (Delone & McLean, 1992; Rai et al., 2002). Information quality refers to accuracy, timeliness, completeness, relevance, consistency, and usefulness of information generated by the system (Delone & McLean, 1992; Fisher et al., 2006). System use is defined as “recipient consumption of the output of an information system” (Delone & McLean, 1992). These success criteria serve as the operationalisations of this study’s dependent variables that is the critical success factors.

In brief, this framework treats the CSFs identified as necessary factors for implementation success, whereas the absence of the CSFs would lead to failure of the system (Rockart, 1979). Within the framework, each of the CSFs identified by the Delphi study is described as follows.

Fig. 4.3 A critical success factors framework for the implementation of business intelligence systems(Source: http://im1.im.tku.edu.tw/~cjou/bi2009/1.pdf)

81

4.22 Committed Management Support and SponsorshipCommitted management support and sponsorship has been widely acknowledged as the most important factor for BI system implementation. All Delphi participants agreed that consistent support and sponsorship from business executives make it easier to secure the necessary operating resources such as funding, human skills, and other requirements throughout the implementation process.

This observation is reasonable and expected because the whole BI system implementation effort is a costly, time-consuming, resource intensive process. Moreover, the Delphi experts further argued that BI system implementation is a continual information improvement program to leverage decision support. They believed that the typical application-based funding for implementation of transactional systems does not apply to BI systems that are adaptive in nature. That is, a BI system evolves through an iterative process of systems development in accordance to dynamic business requirements (Arnott & Pervan, 2005).

Therefore the BI initiative, especially for the enterprise-wide scale, requires consistent resource allocation and top-management support to overcome organisational issues. These organisational challenges arise during the course of the cross functional implementation, as it often uncovers many issues in such areas as business process, data ownership, data quality and stewardship, and organisational structure. Many functional units tend to focus on tactical gains, ignoring the rippling effects imposed on other business units, and one expert observed that, the whole BI effort cut across many areas in the organisation that’s making it very difficult, it hits a lot of political barriers. For instance, for a systems owner, they are only interested in delivering day to day transaction, as long as all that done… that’s what they care about.

Also, without dedicated support from top management, the BI project may not receive the proper recognition and hence the support it needs to be successful. This is simply because users tend to conform to the expectations of top management and so are more likely to accept a system backed by their superiors (Lambert, 1995).

4.23 Business User-Oriented Change ManagementHaving an adequate user-oriented change management effort was deemed critical by the Delphi participants. The experts perceive that better user participation in the change effort can lead to better communication of their needs, which in turn can help ensure the system’s successful implementation. This is particularly important when the requirements for a system are initially unclear, as is the case with many of the decision-support applications that a BI system is designed to sustain (Wixom & Watson, 2001).

Significant numbers of Delphi participants shared the same view that formal user participation can help meet the demands and expectations from various end users. No doubt, the user groups know what they need better than a secluded architect or developer that does not have day to day user experience. Hence, key users must be involved throughout the implementation cycle because they can provide valuable input that the BI team may overlook.

The data dimensions, business rules, metadata, and data context that are needed by business users should be considered and incorporated into the system (Wixom & Watson, 2001). Furthermore, users can provide input to the process through review and testing to ensure that it meets the goals that they think it should. Furthermore, when users are actively involved in the effort, they have a better understanding of the potential benefits and this makes them more likely to accept the system on completion (Hwang et al., 2004). Thus through this ‘implicit’ education approach, it create a sense of ownership by the users.

4.25 Clear Business Vision and Well-Established CaseAs a BI initiative is driven by business, so a strategic business vision is needed to direct the implementation effort. The Delphi participants indicated that a long-term vision, primarily in strategic and organisational terms, is needed to enable the establishment of BI business case. The business case must be aligned to the corporate vision because it would eventually impact the adoption and outcome of the BI system. Otherwise they will not receive the executive and organisational supports that are required to make them successful. Consequently, the investment return of a BI system implementation should be included in those of the business process as a whole.


82

Majority interviewees indicated that the mindset of ‘setting an excellent system there, then people will come to use it’ is totally inappropriate. In fact, one interviewee claimed that: A BI system that is not business-driven is a failed system! BI is a business centric concept. Sending IT off to solve a problem rarely results in a positive outcome. There must be a business problem to solve.

Most participants stressed that a solid business case that was derived from a detailed analysis of business needs would increase the chances of winning support from top management. Thus, a substantial business case should incorporate the proposed strategic benefits, resources, risks, costs and the timeline. Hence, a solid business case would provide justifiable motivations for adopting a BI system to change the existing reporting and analytical practices.

4.25 Business-Driven Methodology and Project ManagementThe next factor to be considered is business driven methodology and project management. According to the Delphi experts, adequate project scoping and planning allows the BI team to concentrate on the best opportunity for improvement. To be specific, scoping helps to set clear parameters and develops a common understanding as to what is in scope and what is excluded (Ang & Teo, 2000). For instance, a Delphi expert gave insight into his experience:

The success of 90% of our project is determined prior to the first day. This success is based on having a very clear and well-communicated scope, having realistic expectations and timelines, and having the appropriate budget set aside. Hence, adequate scoping enables the project team to focus on crucial milestones and pertinent issues while shielding them from becoming trapped in unnecessary events. Many experts further indicate that it is advisable to start small and adopt an incremental delivery approach. Large-scale change efforts are always fraught with greater risks given the substantial variables to be managed simultaneously (Ang & Teo, 2000).

Moreover, business changes very fast and is always looking to see immediate impact, and such an incremental delivery approach provides the tools for delivery of needed requirements in a short time (Greer & Ruhe, 2004). Also, an incremental delivery approach allows for building a long-term solution as opposed to a short term one, as is the case for an evolutionary BI system development (Arnott & Pervan, 2005).

Besides that, some interviewees commented that a BI program that starts off on a high-impact area is always valuable to provide tangible evidence for both executive sponsors and key users (Morris et al., 2002). According to them, adopting this so-called ‘low hanging fruits’ approach projects with the greatest visibility and monetary impact demonstrate to leadership that there is a payback (ROI) for their investment and it shows it in a short timeframe. This will increase leadership support and help the other associated initiatives to be supported readily. One interviewee elaborated that:

We cannot role out the whole BI system at once but people want to see some key areas. We need to do data marts for a couple of key areas and then maybe a small number of other key reports in an attempt to keep all stakeholders happy. Then when the first release is done and we get some feedback, we can work on other data mart areas and enhance existing subject areas over time. Therefore, a ‘low hanging fruits’ approach allows an organisation to concentrate on crucial issues, so enabling teams to prove that the system implementation is feasible and productive for the enterprise.

4.26 Business-Centric Championship and Balanced Project Team CompositionThe majority of Delphi experts believed that having the right champion from the business side of the organisation is critical for implementation success. According to them, a champion who has excellent business acumen is always important since he/she will be able to foresee the organisational challenges and change course accordingly. More importantly, this business-centric champion would view the BI system primarily in strategic and organisational perspectives, as opposed to one who might over-focus on technical aspects. For example, as noted by an interviewee:

83

The team needs a champion. Champion is someone who understands the business and the technology and is able to translate the business requirements into a (high-level) BI architecture for the system. All interviewees also agreed that the composition and skill sets of a BI team have a major influence on the implementation success. The project team should be cross-functional and composed of those personnel who possess technical expertise and those with a strong business background (Burton et al., 2006). As most interviewees stressed, a BI system is a business-driven project to provide enhanced managerial decision support, and so a suitable mix of IT expertise is needed to implement the technical aspects, whereas the reporting and analysis aspects must be under the realm of business personnel.

Furthermore, most experts posited that the BI team must identify and include business domain experts, especially for such activities as data standardisation, requirement engineering, data quality analysis, and testing. Many respondents also agreed with the critical role played by external consultants, especially at early phase. They believed that the lack of in-house experience and competencies can be complemented by external consultants who have spent the majority of their time working on similar projects. As well as being a subject matter expert, the interviewees indicated that an external consultant could provide an unbiased view of solution to a problem. This is because the organisational structure of an engineering asset management enterprise is traditionally functional-oriented and culturally fragmented with information systems design.

There may even be situations where the client possesses the expertise to solve a particular problem, but are conflicted on the organisational ground. An external consultant hence can evaluate and propose an unbiased course of action without having fear of political repercussions (Kaarst-Brown, 1999).

4.27 Strategic and Extensible Technical FrameworkIn terms of strategic and extensible technical framework, most experts asserted that stable source/back-end systems are crucial in implementing a BI system. A reliable back-end system is critical to ensure that the updating of data works well for the extraction, transformation and loading (ETL) processes in the staging area (Ponniah, 2001). Hence the data can be transformed to provide a consistent view into quality information for improved decision support. It is therefore crucial for BI team to assess the stability and consistency of source systems before embarking on a BI effort. Otherwise after the system implementation, the cost of changes in terms of time and money can be significant.

A BI expert explained the importance of this factor in detail:In the case of a mining company, they don’t have consistent back-end systems, in some departments, they have just large number of spreadsheets, which call production data into their spreadsheets, it is scary. It’s a major impediment to BI system, and we got multiple bits over all the places.

Another prime element concerned by the respondents was that the technical framework of a BI system must be able to accommodate scalability and extendibility requirements. Having a strategic view embedded in the system design, this scalable system framework could include additional data sources, attributes, and dimensional areas for fact-based analysis, and it could incorporate external data from suppliers, contractors, regulatory bodies, and industry benchmarks (Watson et al., 2004). It would then allow for building a long-term solution to meet incremental needs of business.

The majority of interviewees also agreed that a prototype is always valuable as proof of a concept. That is, constructing a fairly small BI application for a key area in order to provide tangible evidence for both executive sponsors and general users (Watson et al., 2001). They perceive that a prototype that offers clear forms of communication, and better understanding in an important business area, would convince organisational stakeholders on the usefulness of a BI system implementation. As a result of a successful prototype, senior management and key users would be more likely and more motivated to support larger-scale BI efforts.

4.28 Sustainable Data Quality and Governance FrameworkThe Delphi findings indicate that the quality of data, particularly at the source systems, is crucial if a BI system is to be implemented successfully. According to the interviewees, a primary purpose of the BI system is to integrate ‘silos’ of data sources within enterprise for advanced analysis so as to improve the decision-making process. Often,


84

much data related issues within the back-end systems are not discovered until that data is populated and queried against in the BI system (Watson et al., 2004). Thus corporate data can only be fully integrated and exploited for greater business value once its quality and integrity are assured.

The management are also urged to initiate data governance and stewardship efforts to improve the quality of the data in back-end systems because unreliable data sources will have a ripple effect on the BI applications and subsequently the decision outcomes (Chengalur- Smith et al., 1999). For instance, an expert expressed his concern: This is the most underrated and underestimated part of nearly every BI development effort. Much effort is put into getting the data right the first time, but not near enough time is spent putting in place the data governance processes to ensure the data quality is maintained.

Some interviewees further argued that a sound data governance initiative is more than ad-hoc data quality projects. Indeed, it should include a governing committee, a set of procedures, and an execution plan. More specifically, the roles of data owners or custodians and data stewards must be clearly defined (Watson et al., 2004). Frontline and field workers should be made responsible for their data source and hence data quality assurance. Meanwhile, a set of policies and audit procedures must be put into place that ensures ongoing compliance with regulatory requirements as most EAMOs like utilities are public-owned company.

Apart from that, the Delphi participants believed that common measures and definitions address the data quality dimension of representational consistency. This allows all stakeholders to know that this term has such definition no matter where it is used across the source systems. Furthermore, it is typical for an EAMO to have hundreds of varying terms with slightly different meanings, because different business units tend to define terms in ways that best serve their purposes. Often accurate data may have been captured at the source level; however, the record cannot be used to link with other data sources due to inconsistent data identifier. This is simply because data values that should uniquely describe entities are varied in different business units. Once an organisation collects a large number of reports it becomes harder to re-architect these areas.

As a result, a cross-system analysis is important to help profiling a uniform ‘master data set’ which is in compliance with business rules. The development of a master data set on which to base the logical data warehouse construction for BI system will ease terminology problems (Watson et al., 2004). In order to have consistent measures and classification across subject areas, most interviewees asserted that business-led commitment is pivotal to establish consensus on data measurement and definition. Indeed, a BI system implementation is a business driven initiative to support the reporting and analytical requirements of business. As a result, the BI team would use those common definitions to develop an enterprise-wide dimensional model that is business-orientated. Many participants asserted that a correct dimensional data model is the absolute cornerstone of every BI project.

A faulty model will surely lead to failure of the project as it will fail to deliver the right information. A sustainable metadata model on which to base the logical and physical data warehouse construction for a BI system was deemed critical by many experts. Therefore, the metadata model should be flexible enough to enable the scalability of the BI system while consistently providing integrity on which OLAP and data mining depend (Watson & Haley, 1997).

4.29 Concluding Remarks and Future ResearchThis theory building research presents a CSFs framework derived from a Delphi study with BI systems experts within engineering asset management domain. An analysis of the findings demonstrated that there are a number of CSFs peculiar to successful BI system implementation. More importantly, this study revealed a clear trend towards multi-dimensional factors in implementing BI systems. Organisational factors were perceived to be more important than the technological ones because the BI team considered them to be outside their direct control. Furthermore, the contextual elements of these CSFs appear to be substantially different from the implementation effort of conventional operational systems.

85

The research is likely to make both theoretical and practical contributions to the field of BI systems implementation. First, this study fills in the research gap by building theory of CSFs, addresses issues of concern to practitioners and supplements the current limited understanding on implementation issues of BI systems. Moreover, this research provides thought-provoking insights into multi-dimensional CSFs that influence the BI systems implementation. The contextual elements identified alongside for each of the critical factors and the consolidated

CSFs framework provides a comprehensive and meaningful understanding of CSFs. Not only does this research contribute to the academic literature but it benefits organisations in several ways as well. Essentially, BI practitioners (both current and potential) will be better able to identify critical factors for successfully implementing BI systems. The findings will enable them to better manage their implementation of BI systems if they understand that such effort involves multiple dimensions of success factors occurring simultaneously and not merely the technical aspects of the system.

With the CSFs framework, it could enable BI stakeholders to better identify the necessary factors, and to possess a comprehensive understanding of those CSFs. Such outcomes will help them to improve the effectiveness and efficiency of their implementation activities, by obtaining a better understanding of possible antecedents that lead to successful BI system implementation. For senior management, this research finding can certainly assist them by optimising their scarce resources on those critical factors that are most likely to have an impact on the BI systems implementation. Moreover, the management can concentrate their commitment to monitor, control and support only those key areas of implementation.


86

SummaryAny part of a business can benefit from more systematic creation, gathering, and interpretation of information • leading to better decisions and more informed actions.

Most analytical competitors are leaders in their industries and quite successful in financial terms.•

Many corporations compete on the basis of their ability to initiate, expand, and maintain relationships with • customers.

“Predictive modelling” uses a variety of analytical techniques to make estimates about the future based on • current and historical data.

Competitive pressures are driving companies to personalise the way they manage customer relationships.•

Many businesses have to account for unique customer risk, and then price the product or service based on the • cost of covering that risk.

Too many businesses try to retain customers only after the customer attempts to terminate the relationship.•

Fraud ranges from inaccurate credit applications, fraudulent transactions, and false insurance claims, to identity • theft.

Assuming that critical data reside in transactional systems such as an enterprise resource planning (ERP) system, • it must be extracted with the use of “extract, transform, and load” software.

The labour force growth rate is slowing, and labour markets are projected to continue tightening.•

ERP revolutionised the enterprise applications market in the 90s, but no one could envisage the subsequent • development.

Thus, major ERP vendors have been increasingly embracing OLAP (On-line Analytical Processing) tools that • provide a high-level aggregated view of data.

Business intelligence and analytics provide an environment in which business users receive information that is • reliable, consistent, understandable and easily manipulated.

CPM crosses traditional department boundaries to manage the full life cycle of business decision-making.•

Howard Dresner, father of the Business Intelligence concept, explains why enterprises must make business • intelligence an imperative.

An enterprise should go through two phases before it is ready for BI.•

Background consists of engineering asset management organisations (EAMOs) such as utilities and transportation • enterprises, store vast amounts of asset-oriented data.

Committed management support and sponsorship has been widely acknowledged as the most important factor • for BI system implementation.

87

ReferencesRoebuck, K., 2011. • BusinessAnalytics:High-ImpactStrategies-WhatYouNeedtoKnow:Definitions,Adoptions,Impact,Benefits,Maturity,Vendors, Lightning Source Incorporated Publication.

Loshin, D., 2003. • Business Intelligence:TheSavvyManager’sGuide,GettingOnboardwithEmerging IT, Morgan Kaufmann Publication.

Yeoh, W., Koronios, A., & Gao, J., 2008. • Managing the Implementation of Business Intelligence Systems: A Critical Success Factors Framework, [Online] Available at: <http://im1.im.tku.edu.tw/~cjou/bi2009/1.pdf> IGI PUBLISHING [Accessed 24th April 2012].

HURBEAN LUMINITA, • BUSINESSINTELLIGENCE:APPLICATIONS,TRENDS,ANDSTRATEGIES, [Online] Available at: < http://anale.feaa.uaic.ro/anale/resurse/46_Hurbean_L_-_Business_intelligence-applications,_trends_and_strategies.pdf > [Accessed 24th April 2012].

IBM, 23• rd Dec 2011. Business Analytics - Turning Data Into Insight, [Video Online] Available at: < http://www.youtube.com/watch?v=6jDjeNJrN14 > [Accessed 24th April 2012].

IBMsystems, 27• th June 2011. Trends in Business Analytics, [Video Online] Available at: <http://www.youtube.com/watch?v=nfMnILQVZXo> [Accessed 24th April 2012].

Recommended ReadingJank, W., 2011. • Business Analytics for Managers, Springer Publication.

Wiley, G., & Thorlund, J., 2010. • Business Analytics for Managers: Taking Business Intelligence Beyond Reporting, John Wiley & Sons Publication.

Evans, J., 2012. • Business Analytics, Pearson College Division Publication.


88

Self AssessmentWhich of the following is not an application category of predictive modelling?1.

Valuationa. Pricing b. Retentionc. Pre-emptiond.

_________undermines the profitability of companies and drives up the costs of goods and services for 2. customers.

Fraud detectiona. Valuationb. Pricingc. Customisationd.

Modelling and analysis can be done using___________.3. Dashboardsa. Spreadsheetsb. Reportsc. Ad-hoc queryd.

_______is treated as a corporate resource, and transformed from quantity to quality.4. Reporta. Fileb. Datac. Memoryd.

Which of the following is data management technology?5. OLATPa. OLEDBb. OLAP-Serverc. OLAPd.

_________uses a variety of analytical techniques to make estimates about the future based on current and 6. historical data.

Business analyticsa. Customer relationshipb. Predictive modellingc. Model analyticsd.

___________is a mature predictive modelling application.7. Credit scoringa. Customer relationshipb. Dashboardc. Spreadsheetd.

89

Which of the following key role CEO or business unit head play?8. Analysisa. Executionb. Testingc. Integrationd.

__________decides where analytics should be leveraged in the business.9. Strategya. Trendb. Applicationc. Pricingd.

________drive with passion and commitment the organisational changes is needed by an analytical 10. competitor.

Capabilitya. Strategyb. Executionc. Strategyd.


90

Chapter V

Data Warehousing and Data Mart

Aim


introduce data warehousing to solve Business Intelligence problems•

elucidate data mart development with various models•

explain corporate information architecture•

Objectives


explain data mart and data warehouse•

explicate information input and output in corporate information factory•

elucidate steps in implementing data mart•

Learning outcome


understand business intelligence’s and business management’s capabilities•

identify data management•

describe development models and their feedback•

91

5.1 IntroductionThe systems that contain operational data the data that runs the daily transactions of our business contain information that is useful to business analysts. For example, analysts can use information about which products were sold in which regions at which time of year to look for anomalies or to project future sales. However, there are several problems if analysts access the operational data directly:

They might not have the expertise to query the operational database. For example, querying IMS databases • requires an application program that uses a specialised type of data manipulation language. In general, those programmers who have the expertise to query the operational database have a full-time job in maintaining the database and its applications.

Performance is critical for many operational databases, such as databases for a bank. The system cannot handle • users making ad-hoc queries.

The operational data generally is not in the best format for use by business analysts. For example, sales data • that is summarised by product, region, and season is much more useful to analysts than the raw data.

Data warehousing solves these problems. In data warehousing, we create stores of informational data that is extracted from the operational data and then transformed for end-user decision making. For example, a data warehousing tool might copy all the sales data from the operational database, perform calculations to summarise the data, and write the summarised data to a separate database from the operational data. End-users can query the separate database (the warehouse) without impacting the operational databases.

5.2 The Corporate Information FactoryThe Corporate Information Factory (CIF) is a logical architecture whose purpose is to deliver business intelligence and business management capabilities driven by data provided from business operations. The CIF has proven to be a stable and enduring technical architecture for any size enterprise desiring to build strategic and tactical decision support systems (DSSs). The CIF consists of producers of data and consumers of information. The figure below shows all the components found within the Corporate Information Factory architecture.

The producers in the Corporate Information Factory capture the data (integration and transformation) from the operational systems and assemble it (data management) into a usable format (data warehouse or operational data store) for consumption by the business consumers. The CIF consumers acquire the information produced (data delivery), manipulate it (data marts) and assimilate it into their own environments (decision support interface or transaction interface).


92

Fig. 5.1 The corporate information factory architecture (Source: http://ssdi.di.fct.unl.pt/dw/ressources/lectures/files/DW-02-Basics.pdf)

We use the simple model of separating these two fundamental processes into “getting data in” versus “getting information out.” The next figure demonstrates the relationship between these two processes as well as the distinct components of the CIF involved in each.

93

Fig. 5.2 Getting data in vs. getting information out(Source: http://www.information-management.com/issues/19991201/1667-1.html)

5.3 Getting Data InProducers are the first link in the information food chain. They synthesise data into raw information and make it available for consumption across the enterprise. Each producer will be defined and explained in further detail.

5.3.1 Operational SystemsOperational systems are the family of systems operational, reporting, etc. from which the Corporate Information Factory inherits its characteristics. These are the core that run the day-today business operations and are accessed through application program interfaces (APIs). The operational environment represents a major source of data for the CIF. Other sources may include external data, informal data such as contract notes, e-mails, spreadsheets, etc.

The ability or inability to capture the appropriate data in operational systems sets the stage for the value of the Corporate Information Factory itself. The success or failure of the CIF depends heavily on these operational systems to supply the richness in data needed to understand the business and to provide the history needed to judge the health of the business.

Let’s examine some of the problems surrounding the operational environment. Unfortunately these problems often find their way into the systems and processes in the CIF as well. The operational systems are usually built around the product they support. Corporations are moving their organisational focus from product to customer in an effort to differentiate their offerings and ultimately survive in the business environment. Operational systems have inherently focused on product and thus lack the ability to recognise or manage data about customers. These include such fundamental questions as:

What products belong to a customer (for integrated billing)?•

What offerings are relevant to a particular customer and/or household (for smart targeting)?•

The CIF must provide facilities to define how corporate data relates to a customer, rules for integration (data • modelling) and the means to document these relationships/rules (Meta data).

Operational systems, by and large, were not designed with integration of data in mind. They were built to perform • a specific function, regardless of what other operational systems contain for data. Again this is handed down to the Corporate Information Factory where it is somewhat rectified through the integration and transformation process.


94

A related problem with operational systems concerns the weak linkages between the systems. At best, the • systems pass a very limited amount of data between them the bare minimum to satisfy the receiving system. The Corporate Information Factory uses massive amounts of data from all of the operational systems and must synthesise, clean and integrate the data before it is usable.

Operational systems are not able to handle large amounts of history. By their nature, they should be responsible • for only the most current state of affairs. That is what they were designed to do, and they do that quite well. However, there is much to be learned from our historical data. Therefore, the Corporate Information Factory must act as the historian for the entire enterprise.

5.3.2 Integration and TransformationIntegration and transformation consists of the processes to capture, integrate, transform, cleanse, reengineer and load source data into the data warehouse or operational data store.

Fig. 5.3 Integration and transformation process(Source: http://www.information-management.com/issues/19991201/1667-1.html)

Integration and transformation is one of the most important processes in the Corporate Information Factory producer role. It has the critical job of converting the chaos in the operational world to the ordered world of information. This process assimilates the data from the operational environment of heterogeneous technologies into the integrated, consistent world of the CIF, suitable now for consumption by the decision support processes that is, consumers.

It is the responsibility of this process to prepare and load data into the data warehouse and/or operational data store. In doing so, we should consider the following:Where possible, leverage the horsepower of the mainframe to pre process the operational data. If this pre processing can be performed throughout the day rather than waiting for the batch window, it can greatly speed up the integration and loading process.

An integration and transformation process that “pulls” data from the operational systems, rather than “pushes” data from it, provides better control.

The greatest challenge for integration and transformation is when data is received from sources that have organised data around different keys. For example, one system manages data by a demographic code, another system manages data by account number and yet another manages data by invoice ID. In the end, all three sources need to be integrated to provide a complete view of a customer. This process can involve sophisticated matching (fuzzy logic) rules and name/address normalisation and standardisation to determine what data belongs to what customer.

Before we begin the process of producing data to be loaded into the data warehouse, we should develop protocols for configuration management and the scheduling of these processes.

95

The level of effort needed for integration and transformation is greatly affected by the level of understanding we have of the source data. The more familiar we are with the operational data and its creation, the easier the integration and transformation process will be.

Once the integration and transformation piece is delivered, the good news is that it is relatively stable and predictable. There will always be changes occurring to these processes and programs simply due to the fluidity of the operational environment and the need for new information as the end users begin to explore new DSS possibilities. However, these should be handled by standard change request procedures.

It is a wise producer that develops an audit strategy up front rather than after the integration and transformation process is in place. We must be able to confirm that the conversions, integrations, transformations, etc., are performing as expected and planned.

One of the functions this producer can perform is in data preparation for the consumers. It is reasonable and prudent to create summarisations, derivations and even start star schema tables that can then be used easily by data delivery. Otherwise, the burden of calculating, deriving and setting up dimensions falls on the hardy data delivery piece.

5.3.3 Data WarehouseThe data warehouse (DW) is a subject- oriented, integrated, time variant (temporal) and non-volatile collection of data used to support the strategic decision making process for the enterprise or business intelligence. The data warehouse acts as the central point of data integration the first step toward turning data into information. It serves the following purposes:

The data warehouse delivers a common view of enterprise data, regardless of how it may later be used by the • consumers.

Since it is the generic “foodstuff” for the consumers, it supports the flexibility in how the data is later interpreted • (consumed). The data warehouse produces a stable source of historical information that is constant, consistent and reliable for any consumer.

Because the enterprise as a whole has an enormous appetite for information, the data warehouse can grow to • huge proportions (one to 20 terabytes or more!).

The data warehouse is set up to serve many rather than a few in terms of consuming information. That is, many • data marts can be created from the data contained in the data warehouse, rather than each data mart serving as its own producer and consumer.

Because of this central role for the data warehouse, there are several considerations that developers should remember:

Usage of the data warehouse by the ultimate consumers (the business community) may be restricted. We may • find that access to this producer should be limited to the data delivery process rather than opening it up to all consumers. This will allows us to maintain our focus on data loading and management.

Because this data warehouse producer must focus its energy on holding the corporation’s history and producing • information to be consumed later, little or no transaction processing occurs within its database. These activities are far better suited for other producers such as the operational systems or the operational data store.

Due to the lack of transactional processing and the large volume of data that these databases contain, we may • want to limit the number of refreshes to a minimum for example perhaps a weekly or even monthly refresh.

Finally, due to the size of these constructs, they are generally found on relational and high performance • technologies such as MPP or SMP platforms.


96

5.3.4 Operational Data StoreThe operational data store (ODS) is a subject-oriented, integrated, current and volatile collection of data used to support the tactical decision-making process for the enterprise or business management. Just as the data warehouse is the central point of integration for business intelligence, the operational data store becomes the central point of data integration for business management. It is a perfect complement to the strategic decision-making processes provided through the data warehouse/data mart constructs.

The operational data store has the following roles:

It delivers the common view of enterprise data for operational processing. By being the point of integration • for operational data, the operational data store produces the “foodstuffs” for the tactical decision-makers of the corporation.

The operational data store supports the actions resulting from business intelligence activities by supplying the • current, integrated enterprise-oriented data. The ability to act upon the result sets generated from data marts is critical in balancing the ecosystem to support “planning” and “action” activities of the business.

The operational data store is relatively straightforward to deploy. However, deployment becomes increasingly • difficult as the demands for currency of data grow.

5.3.5 Data ManagementData management is responsible for the ongoing management of data within and across the data warehouse and operational data store. This includes archival/restoration, partitioning, movement of data between the DW and ODS, event triggering, aggregation of data, backups and recoveries, etc.

Data management can be thought of as an extension to the data warehouse database management system in that it:

Is responsible for the application level partitioning and segmentation of the data warehouse.•

Performs the data archival and retrieval functions from near-line storage media. This can be a particularly • difficult problem as the archived data ages.

Is responsible for disaster recovery and backups and recoveries.•

Monitors and measures the quality of the data in the data warehouse and operational data store.•

Creates standard summarisations and aggregations.•

Unfortunately, data management is a process that is usually not planned for at the beginning of most projects. However, soon after the data warehouse is up and running, data management quickly becomes a primary concern of the development team. A secondary challenge is that the availability of tools in the marketplace is limited. Unfortunately, this forces corporations into the position of building these capabilities.

5.4 Getting Information OutConsumers gain their energy from the output of producers and manipulate it for their own purposes. In the CIF, these consumers constitute the decision support mechanisms for the corporation. The ultimate consumers in the Corporate Information Factory are members of the business community and have been classified as farmers, explorers, miners, operators and tourists.

97

5.5 Data DeliveryData delivery is a work group environment designed to allow end users (or their supporting IS group) to build and manage views of the data warehouse within their data mart. Data delivery provides the mechanism for requesting, prioritising and monitoring data mart creation and refinement. There are three steps in the process of creating the data mart:

Filter: The information consumed by the data delivery process is obtained from the data warehouse. A filtering • mechanism removes all information that is not needed by the data mart process.

Format: The filtered information is then assimilated into a schema that is suitable for the secondary consumer • that is DSS. Usually this is in the form of a star schema or snowflake schema, a set of flat files or perhaps a normalised subset of data from the warehouse.

Deliver: The last step in the process is to ensure that the correct information is delivered to the appropriate data • mart technology in a timely manner with the appropriate notifications to the ultimate consumers the business community.

In creating the data delivery process, we should:

Try to keep the process simple until the dynamics of the environment are understood and all other infrastructure • components are in place.

Build a system to manage the data mart requests first. This should include a process to prioritise and consolidate • the requests. This becomes a very useful process in managing requests and promoting communications with the end users.

Try to develop and use templates for formatting request results wherever possible. These will be invaluable as • our data mart population grows. They will be used to assist in automating the format process.

5.5.1 Data MartA data mart contains data from the data warehouse tailored to support the specific analytical requirements of a given business unit or business function. The data mart is the recipient of the information assimilated by the data delivery process. Data marts may have either a business unit or functional view of the data warehouse data; thus data marts utilise the common corporate view of strategic data established in the data warehouse by the integration and transformation process. Some points about the data mart:

The data mart may or may not be located on the same machine as the data warehouse. This allows consumers • to select the best technology to support their particular style of decision making.

Data marts should be conservatively implemented as an extension of the data warehouse, not as an alternative. • This does not mean that we should not implement a data mart as a proof of concept. Indeed, this is perhaps one of the best ways to demonstrate viability of the DSS environment. However, the long-term strategy dictates that the full Corporate Information Factory infrastructure is necessary for a healthy DSS environment.

Data marts are the ideal constructs for classical decision support, including data mining and data visualisation • processes. However, we should keep in mind the trade off between the simplicity of design and the cost of administration of many data marts.


98

5.5.2 Decision Support Interface (DSI)The decision support interface provides the end user with easy-to-use, intuitively simple tools to distil information from data. DSI consists of the secondary consumers in the Corporate Information Factory. It is from these systems that analysis activities are enabled. There is much flexibility in terms of tool and technology choices, allowing the end user to match the tool to the task at hand. Some of the considerations in this environment are:

The data mart is the source of information for DSI while the data warehouse itself may be somewhat restricted • in access.

The types of tools may be categorised as query, reporting, multidimensional or online analytical processing, • data mining, data exploration or data visualisation tools.

It is recommended to prototype extensively before making a purchase of DSI tool. Also don’t try to do too • much at first. It helps to understand how the end users will use the tools and the information by starting small and growing.

We should plan on supporting several tools in each category. This can become a very resource intensive situation • that may prohibit further construction of our CIF.

5.5.3 Transaction Interface (TrI)The transaction interface provides the end user with an easy-to-use, intuitively simple interface to request and employ business management capabilities. It uses the operational data store as its source of data. TrI is the catalyst (or messaging infrastructure) that provides the delivery and management of requests. It provides the presentation and functionality to prepare/submit/process requests for information. A good example of a consumer is CTI (computer telephone integration). CTI provides a very sophisticated environment for managing customer calls but lacks the information (and subsequently the knowledge) for driving the interaction with the customers. By integrating CTI (the application) with the operational data store (via TrI), customers can be routed to the appropriate business professional that has the critical information needed to deliver premier customer care.

5.5.4 Meta DataMeta data provides the necessary details to promote data legibility, use and administration. Its contents are described in terms of data about data, activities and knowledge.

Fig. 5.4 Data management(Source: http://www.information-management.com/issues/19991201/1667-1.html)

99

Meta data is a formal component of the Corporate Information Factory and should not be given short shrift. It is Meta data that provides comprehension to the end users and information concerning the management of the environment to the administrators. Some of the considerations for Meta data are:

Start gathering and managing Meta data from the very start of Corporate Information Factory creation. Meta • data becomes of primary interest to the end users almost immediately.

Develop a rational versioning scheme for Meta data. Determine what events or conditions constitute a new • version.

Integrate the business and technical Meta data and provide views that are appropriate for each group. Incorporate • robust search capabilities such as browsers. We should consider using the Internet or an intranet for Meta data delivery.

Integrate the various sources of Meta data and maintain the accuracy of this information. Make sure that we • can easily accommodate new requirements as our construction progresses.

Fig. 5.5 Data delivery(Source: http://www.information-management.com/issues/19991201/1667-1.html)

5.5.5 What is a Data Mart?A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area), such as Sales, Finance, or Marketing. Data marts are often built and controlled by a single department within an organisation. Given their single-subject focus, data marts usually draw data from only a few sources. The sources could be internal operational systems, a central data warehouse, or external data.

5.6 How is Data Mart Different from a Data Warehouse?A data warehouse, unlike a data mart, deals with multiple subject areas and is typically implemented and controlled by a central organisational unit such as the corporate Information Technology (IT) group. Often, it is called a central or enterprise data warehouse. Typically, a data warehouse assembles data from multiple source systems.

Nothing in these basic definitions limits the size of a data mart or the complexity of the decision-support data that it contains. Nevertheless, data marts are typically smaller and less complex than data warehouses; hence, they are typically easier to build and maintain. Table below summarises the basic differences between a data warehouse and a data mart.


100

Category Data Warehouse Data MartScope Corporate Line of Business (LOB)Subject Multiple Single subjectData Sources Many FewSize (typical) 100 GB-TB+ < 100 GBImplementation Time Months to years Months

Table 5.1 Basic differences between a data warehouse and a data mart

5.7 Dependent and Independent Data MartsThere are two basic types of data marts: dependent and independent. The categorisation is based primarily on the data source that feeds the data mart. Dependent data marts draw data from a central data warehouse that has already been created. Independent data marts, in contrast, are standalone systems built by drawing data directly from operational or external sources of data, or both.

The main difference between independent and dependent data marts is how we populate the data mart; that is, how we get data out of the sources and into the data mart. This step, called the Extraction-Transformation-and Loading (ETL) process, involves moving data from operational systems, filtering it, and loading it into the data mart.

With dependent data marts, this process is somewhat simplified because formatted and summarised (clean) data has already been loaded into the central data warehouse. The ETL process for dependent data marts is mostly a process of identifying the right subset of data relevant to the chosen data mart subject and moving a copy of it, perhaps in a summarised form. With independent data marts, however, we must deal with all aspects of the ETL process, much as we do with a central data warehouse. The number of sources is likely to be fewer and the amount of data associated with the data mart is less than the warehouse, given our focus on a single subject.

The motivations behind the creation of these two types of data marts are also typically different. Dependent data marts are usually built to achieve improved performance and availability, better control, and lower telecommunication costs resulting from local access of data relevant to a specific department. The creation of independent data marts is often driven by the need to have a solution within a shorter time.

5.8 What are the Steps in Implementing a Data Mart?Simply stated, the major steps in implementing a data mart are to design the schema, construct the physical storage, populate the data mart with data from source systems, access it to make informed decisions, and manage it over time. This section contains the following topics:

5.8.1 DesigningThe design step is first in the data mart process. This step covers all of the tasks from initiating the request for a data mart through gathering information about the requirements, and developing the logical and physical design of the data mart. The design step involves the following tasks:

Gathering the business and technical requirements•

Identifying data sources•

Selecting the appropriate subset of data•

Designing the logical and physical structure of the data mart•

101

5.8.2 ConstructingThis step includes creating the physical database and the logical structures associated with the data mart to provide fast and efficient access to the data. This step involves the following tasks:

Creating the physical database and storage structures, such as table spaces, associated with the data mart.•

Creating the schema objects, such as tables and indexes defined in the design step.•

Determining how best to set up the tables and the access structures.•

5.8.3 PopulatingThe populating step covers all of the tasks related to getting the data from the source, cleaning it up, modifying it to the right format and level of detail, and moving it into the data mart. More formally stated, the populating step involves the following tasks:

Mapping data sources to target data structures•

Extracting data•

Cleansing and transforming the data•

Loading data into the data mart•

Creating and storing metadata•

5.8.4 AccessingThe accessing step involves putting the data to use: querying the data, analyzing it, creating reports, charts, and graphs, and publishing these. Typically, the end user uses a graphical front-end tool to submit queries to the database and display the results of the queries. The accessing step requires that you perform the following tasks:

Set up an intermediate layer for the front-end tool to use. This layer, the Meta layer, translates database structures • and object names into business terms, so that the end user can interact with the data mart using terms that relate to the business function.

Maintain and manage these business interfaces.•

Set up and manage database structures, like summarised tables that help queries submitted through the front-end • tool execute quickly and efficiently.

5.8.5 ManagingThis step involves managing the data mart over its lifetime. In this step, we perform management tasks such as the following:

Providing secure access to the data•

Managing the growth of the data•

Optimising the system for better performance•

Ensuring the availability of data even with system failures•

5.9 Patterns of Data Mart DevelopmentIn the beginning, the re were only the islands of information: the operational data stores and legacy systems that needed enterprise-wide integration; and the data warehouse: the solution to the problem of integration of diverse and often redundant corporate information assets. Data marts were not a part of the vision. Soon though, it was clear that the vision was too sweeping. It is too difficult, too costly, too impolitic, and requires too long a development period, for many organisations to directly implement a data warehouse.


102

A data mart, on the other hand, is a decision support system incorporating a subset of the enterprise’s data focused on specific functions or activities of the enterprise. Data marts have specific business-related purposes such as measuring the impact of marketing promotions, or measuring and forecasting sales performance, or measuring the impact of new product introductions on company profits, or measuring and forecasting the performance of a new company division. Data Marts are specific business-related software applications.

Data marts may incorporate substantial data, even hundreds of gigabytes, but they contain much less data than would a data warehouse developed for the same company. Also since data marts are focused on relatively specific business purposes, system planning and requirements analysis are much more manageable processes, and consequently design, implementation, testing and installation are all much less costly than for data warehouses. In brief, data marts can be delivered in a matter of months, and for hundreds of thousands, rather than millions of dollars. That defines them as within the range of divisional or departmental budgets, rather than as projects needing enterprise level funding. And that brings up politics or project justification.

Data marts are easier to get through politically for at least three reasons. First, because they cost less, and often don’t require digging into organisation-level budgets, they are less likely to lead to interdepartmental conflicts. Second, because they are completed quickly, they can quickly produce models of success and corporate constituencies that will look favourably on data mart applications in general. Third, because they perform specific functions for a division or department that are part of that unit’s generally recognised corporate or organisational responsibility, political justification of a data mart is relatively clean.

After all, it is self-evident that managers should have the best decision support they can get provided costs are affordable for their business unit, and the technology appears up to the job. Perhaps for the first time in computing history those conditions may exist for DSS applications. So, data marts have become a popular alternative to data warehouses. As this alternative has gained in popularity, however, at least three different patterns or informal models of data mart development have appeared. The first response to the call for data mart development was the view that data marts are best characterised as subsets (often somewhat or highly aggregated) of the data warehouse, sited on relatively inexpensive computing platforms that are closer to the user, and are periodically updated from the central data warehouse. In this view, the data warehouse is the parent of the data mart.

The second pattern of development denies the data warehouse its place of primacy and sees the data mart as independently derived from the islands of information that predate both data warehouses and data marts. The data mart uses data warehousing techniques of organisation and tools. The data mart is structurally a data warehouse. It is just a smaller data warehouse with a specific business function. Moreover, its relation to the data warehouse turns the first pattern of development on its head. Here multiple data marts are parents to the data warehouse, which evolves from them organically.

The third pattern of development attempts to synthesise and remove the conflict inherent in the first two. Here data marts are seen as developing in parallel with the data warehouse. Both develop from islands of information, but data marts don’t have to wait for the data warehouse to be implemented. It is enough that each data mart is guided by the enterprise data model developed for the data warehouse, and is developed in a manner consistent with this data model. Then the data marts can be finished quickly, and can be modified later when the enterprise data warehouse is finished.

These three patterns of data mart development have in common a viewpoint that does not explicitly consider the role of user feedback in the development process. Each view assumes that the relationship between data warehouses and data marts is relatively static. The data mart is a subset of the data warehouse, or the data warehouse is an outgrowth of the data marts, or there is parallel development, with the data marts guided by the data warehouse data model, and ultimately superseded by the data warehouse, which provides a final answer to the islands of information problem. Whatever view is taken, the role of users in the dynamics of data warehouse/data mart relationship is not considered. These dynamics are the main subject of this white paper.

103

To develop this subject the original three models are first developed in a little more detail. This development is followed with a presentation of three alternative models that consider the role of feedback from users in the development of data warehouses and data marts. Lastly, an analysis of the usefulness of the six patterns of development is given in light of a particular viewpoint on organisational reality.

5.10 Development Models without Explicit User FeedbackDevelopment models without explicit user feedback consist of top down model and bottom up model, these models are explained below.

5.10.1 The Top Down ModelThe top down model is given graphically in figure below. The data warehouse is developed from the islands of information through application of the extraction, transformation and transportation (ETT) process. The data warehouse integrates all data in a common format and a common software environment.

Fig.5.6 Top-Down flow from data warehouses to data marts(Source: http://www.dkms.com/papers/dwdmdv.pdf)

In theory all of an organisation’s data resources are consolidated in the data warehouse construct. All data necessary for decision support are resident in the data warehouse. After the data warehouse is implemented, there is no further need for consolidation. It only remains to distribute the data to information consumers and to present it so that it does constitute information for them.

The role of the data marts is to present convenient subsets of the data warehouse to consumers having specific functional needs, to help with structuring of the data so that it becomes information, and to provide an interface to front-end reporting and analysis tools that, in turn, can provide the business intelligence that is the precursor to information. The relation of the data marts to the data warehouse is strictly one-way. The data marts are derived from the data warehouse. What they contain is limited to what the data warehouse contains. The need for information they fulfil is limited to what the data warehouse can fulfil.


104

The data warehouse therefore is required to contain all the data that the enterprise or any part of it might need to perform decision support. And if users discover any need the data warehouse does not meet, the only way to fix the situation is for the users to get the enterprise level managers of the data warehouse to change the warehouse structure and to add or modify the data warehouse as necessary to meet user needs. The model contains no description or explanation of this process of recognition and fulfilment of changing user needs or requirements. But it is inconsistent with the model to assume that data marts would serve as a means of fulfilling changing user needs without changes to the data warehouse occurring first.

5.10.2 The Bottom up ModelFigure below depicts the bottom-up pattern of development. In the left-hand portion of figure Two, data marts are constructed from pre-existing islands of information, and the data warehouse from the data marts. In this model the data marts are independently designed and implemented, and therefore unrelated to one another, at least by design.

Fig. 5.7 The bottom up flow from data marts to the data warehouse(Source: http://www.dkms.com/papers/dwdmdv.pdf)

Growth of this kind is likely to contain both redundancy and important information gaps from an enterprise point of view while each data mart achieves an integration of islands of integration in the service of the data mart’s function; the integration exists only from the narrow point of view of the business function sponsoring the data mart. From the enterprise point of view, new legacy systems are created by such a process, and these constitute new islands of information. The only progress made is that the new islands employ updated technology. But they are no more integrated and coherent than the old islands were; and they are no more capable of supporting enterprise wide functions.

105

The right-hand side of figure shows the data mart islands of information being used as the foundation of an integrated data warehouse. A second ETT process supports this integration. It will be needed to remove the redundancy in the data marts, to identify the gaps the process of isolative data mart creation will leave, and to integrate the old islands of information into the new data warehouse in order to fulfil these gaps. The possibility of using older islands of information in this way is not envisioned in this model, which tacitly and incorrectly assumes that the flow from data marts to data warehouse will be adequate to produce a data warehouse with comprehensive coverage of enterprise data needs.

The second model is vague on what happens after the data warehouse is built. Will the data warehouse suddenly become the parent of the data marts, and development proceeds according to the top-down pattern? Or will the data warehouse continue to be the “child” of the data marts, which will continue to evolve and lead periodically to an adjustment in the enterprise data warehouse to make it consistent with the changed data marts? The second model doesn’t answer such questions, but instead ends its story with the creation of the data warehouse.

5.10.3 Parallel DevelopmentThe most popular pattern of development of the first three is the parallel development model. The parallel model sees the independence of the data marts as limited in two ways. First, the data marts must be guided during their development by a data warehouse data model expressing the enterprise point of view. This same data model will be used as the foundation for continuing development of the data warehouse, ensuring that the data marts and the data warehouse will be commensurable, and that information gaps and redundancies will be planned and catalogued as data mart construction goes forward. Data marts will have a good bit of independence during this process. In fact, as data marts evolve, lessons may be learned that will lead to changes in the enterprise data warehouse model. Changes that may benefit other data marts being created, as well as the data warehouse itself.

Fig. 5.8 Data mart creation guided by a data model of the data warehouse(Source: http://www.dkms.com/papers/dwdmdv.pdf)

Second, the independence of data marts is treated as a necessary and temporary expedient on the road to construction of a data warehouse. Once the goal is achieved, the warehouse will supersede the data marts, which will become true subsets of the fully integrated warehouse. From that point on, the data warehouse will feed established data marts, create subsets for new data marts, and, in general determine the course of data mart creation and evolution.


106

The third pattern begins to treat some of the complexities of the relationship between the data warehouse and data marts. Unlike the first pattern, it recognises that organisational departments and divisions need decision support in the short-term and will not wait for data warehouse development projects to bear fruit. Thus data marts are necessary and desirable applications for organisations to pursue. Also unlike the first pattern, it sees the data marts as contributing to the data warehouse through evolution in the enterprise data model stimulated by the data marts.

Unlike the second pattern, the parallel view does not provide for uncontrolled growth in data marts. The contents of data marts are to be determined by the enterprise wide data model. Redundancies and information gaps are to be carefully tracked. The enterprise data model will track the activities and accomplishments of data mart projects and be adjusted accordingly. In the parallel development view, data mart activities will contribute to integration of islands of information within the data warehouse, by constituting islands of integrated information within the overall plan provided by the enterprise data model. These islands, in turn, will eventually enter the comprehensive integration of the data warehouse when it is completed.

The third view still retains difficulties. First, it hinges on the rapid development of the enterprise data warehouse model. Decision support consumers will not wait. Not when they have budgets and can support creation of data marts. Though waiting for a data model is a considerably shorter wait than waiting for a full-blown data warehouse, in large organisations the JAD sessions and requirements analyses preceding data model development can take many months.

And the job must be done carefully. If the enterprise data model is to guide data mart development, it must be comprehensive in its coverage of data needs. Each time the enterprise model fails to identify a table or attribute necessary for a data mart, a little legitimacy is lost, and the feeling grows that it was not worth waiting for the enterprise data warehouse model, and that it will not be worth waiting for it to be adjusted.

Second, the parallel view, like the other two, also assumes that once the data warehouse is constructed, the data marts will become subsets of the data warehouse, rather than autonomous entities. Parallel development will end, and the data warehouse will fulfil everyone’s needs. This assumption is flawed, and envisions a degree of centralisation of large enterprises that no longer exists.

The first three patterns of development all fail to explicitly consider continuous user feedback in response to data mart and data warehouse activities. In each view, user requirements are taken into account in constructing data marts or data warehouses, but user requirements are not static and tend to evolve on exposure to new applications and new technologies. Changes in requirements, further, are not limited only to faster hardware, or better techniques for data mining, or improved database software, or GUI interfaces. They may also encompass changes in information and in data requirements that necessitate adding new attributes and tables to data warehouses and data marts, and reorganising old ones. New requirements may therefore impact data models at both the data mart and data warehouse levels. How will this be handled?

Will all new requirements be processed through the data warehouse management? Or will local management implement most new requirements in local data marts first? Whatever happens will be largely dependent on the nature and amount of feedback from users. The implications of user feedback for the three patterns of development produce three alternative patterns of data warehouse/data mart development. We now turn to these.

107

5.11 Development Models with FeedbackWe will see the feedback from all the above development models.

5.11.1 Top Down with FeedbackSuppose our organisation is one of the pioneers that implemented a data warehouse before developing any data marts. Suppose the requirements analysis process was done carefully, and the enterprise data warehouse now contains all of the data and conceptual domains suggested or implied by that process. We are now given the assignment to develop an application to carefully measure the performance of our department during the last three years, and to forecast it three months into the future. What does the enterprise data warehouse have to contain to allow us to complete that assignment?

Certainly it has to contain indicators that will track the outcome of performance. Changes in sales, profits, and costs are obvious facts that need to be tracked. But what about causal variables, will they be among the attributes of the data warehouse? The answer is some will be. But unless the effort of creating the data warehouse identified all of the domains within the database, and all of the attributes within those domains by referring to a comprehensive conceptual framework broad enough to encompass concepts and attributes contained in all of the causal models possibly relevant to your department’s performance, it is a good bet that the data warehouse will not provide all of the attributes we need.

The data warehouse, further, will not be constructed from the causal modelling point of view, unless we or some other representative of our department were considering a data mart at the time the requirements analysis was done for the data warehouse. There would have been no reason for the department’s representative to either think in those terms, or to undergo the preparation necessary to think in those terms, in time for the data warehousing JAD sessions, or other requirements gathering tasks.

Our representative would almost certainly have specified essential facts to the data warehouse team, and obvious analytical hierarchies such as: company organisation, geography, time, product hierarchies, and so on. But the full makeup of causal dimensions essential for measuring performance, distinguishing it from accident, separating it from either good or bad breaks is likely to be absent.

But we now have the assignment requiring at least some causal modelling, so what do we do? I think we get a subset of the data warehouse for a data mart. But data gathering does not end there. Either we gather data ourselves if our department will support that, or we go to external services that sell data relevant for our problem. If we can do either of these two things, we will then supplement the data warehouse subset with the new data, go through a new, if limited, ETT process, and constitute a data mart that will work for our analysis problem.

The assignment given to us has made our requirements change, and that has made the data mart change, and more specifically go beyond the bounds of the corporate data warehouse. We don’t want to exceed these bounds, but if we don’t, our departmental function suffers, and our job and our boss’s job depend on performing that function, not on maintaining the integrity of the enterprise data warehouse.


108

Fig. 5.9 The top down model with the end user feedback(Source: http://www.dkms.com/papers/dwdmdv.pdf)

So this is the first stage of user feedback. It is shown in figure above. The second stage occurs when the changes made in our data mart are integrated with the enterprise data warehouse. This process can come sooner, or later. It will be sooner, if our company is wise enough to allow continuous feedback from departmental data marts to the data warehouse, and continuous integration of changes that seem necessary at the departmental level. Or alternatively, our company can refuse to recognise the changes being introduced into the data marts at the departmental level. If that’s the pattern, the changes could accumulate for years; until there is a new island of information problem in the company. Then the changes will all come at once with both sides pointing fingers at the other for allowing the data warehouse to get so out of balance with reality.

Whichever pattern applies, the top down model will be subject to departmental user feedback, or adaptation to the top-down data warehouse by departmental data marts. If the continuous pattern of adjustment to departmental changes is adopted, a pattern of gradual evolution of the data warehouses and data marts will occur. The pattern will involve continuous feedback from the periphery to the center, and continual adjustment of both the periphery and the center to each other. The enterprise data warehouse will not bring a once and for-all decision support nirvana, but a much healthier process of continuous conflict and growth in business intelligence.

5.11.2 The Bottom-Up Model with FeedbackThe three user feedback models are similar in the possibilities of adjustment to user feedback in the long run. Once the data warehouse is implemented, in each pattern there is the choice of building in a continuous adjustment process between the data warehouse and the data marts, or centralising further DSS development in the data warehouse (migrating to the top down model).

109

In the short run though, there is considerable difference between the three patterns. In the top down pattern, user feedback before implementation of the data warehouse is through involvement in the system planning, requirements analysis, system design, prototyping, and system acceptance activities of the software development process. For reasons stated earlier, this involvement is likely to leave gaps in the coverage of domains and attributes that are causal in character, or for that matter that involve unanticipated side effects of departmental performance activities.

In the bottom-up pattern, in contrast, the effect of beginning development with data marts is to ensure much more complete coverage of causal and side effect dimensions. This also means that once the data warehouse is implemented, the bottom-up model with feedback will have little initial gap between user data mart requirements, and what is in the data warehouse. Paradoxically, this small gap could result in an enterprise level decision to migrate to the top down model for long-term development, once the data warehouse is in place.

But if this danger is avoided, and the continuous adjustment path to development is followed, then the initial small gap between the data warehouse and data mart requirements will result in a much less painful adjustment process than will be experienced by organisations starting from the top down model. The future should be one of smooth continuous adjustments in the relationships between local data marts and the enterprise wide data warehouse.

Fig. 5.10 The Bottom-up Flow from data marts to the data warehouse with feedback(Source: http://www.dkms.com/papers/dwdmdv.pdf)

Don’t conclude though, that the bottom-up model with feedback shown in figure above is idyllic. It may not imply much pain once the data warehouse is in place, but if too many data marts are developed for too long in following the bottom-up model, the result is a set of new islands of information, and a painful process of handling redundancies and information gaps in data warehouse construction. So, if the top down model with user feedback means excessive pain in adjusting to data marts following construction of the data warehouse, the bottom-up model can mean excessive pain in integrating information and data from data marts during data warehouse construction.


110

5.11.3 The Parallel Model with Feedback

Fig. 5.11 Data mart creation guided by a data model of the data warehouse with feedback and an Eventual data warehouse

(Source: http://www.dkms.com/papers/dwdmdv.pdf)

Again, of the three alternative patterns, the parallel model offers the most promise. Development begins with a period of mutual adjustment between the enterprise data model and the data marts. As long as the center is open to data mart feedback and adjusts itself to the departmental perspectives on causal and side effect dimensions and attributes, the period of data warehouse development can be relatively smooth. While the data marts should be guided by the enterprise data warehouse model, in a very real sense, the enterprise level model should be guided by the individual and collective input from the data marts. Though the enterprise data warehouse data model is more than the aggregate of collected data mart models, it must certainly encompass those if it is to perform its long-term coordinative/integrative functions.

The danger implementing the parallel model is at the beginning of development. The model assumes completion of the data warehouse data model before data mart development begins, and therefore requires rapid development of the enterprise level model, and also requires the data marts to wait until this development is complete. This assumption is not necessary for the parallel model. It is probably enough for the data warehouse data model to be in development at the same time as the first data marts, and for the data warehouse to adopt a coordinative and gentle guidance role in common efforts with data mart development staffs.

A complete enterprise level data warehouse model is not necessary to monitor and evaluate interdepartmental redundancies, and to track information gaps. Nor is it necessary to coordinate data mart back-ends to ensure eventual compatibility. On the other hand, if data marts are coordinated by a central modelling team and encouraged to proceed with completion of their data marts with all deliberate speed, their results will inform the enterprise level team of what data warehouse requirements are much more effectively than the most carefully conducted JAD or requirements gathering sessions are likely to do.

111

5.12 The Dynamics of Data Mart DevelopmentThe three initial patterns of data mart development are unrealistic in their failure to take account of user feedback to data marts and data warehouses. By introducing explicit consideration of user feedback, one can see that the issue of centralised versus decentralised DSS development is one of long-term as well as short-term significance. All three patterns of development face the key decision of what to do once the data warehouse is developed.

Will data marts then be handed down from on high, or will departments and divisions of enterprises have autonomy in evolving their data marts? It is clear that autonomy with central coordination is the most practical course for enterprises in the long run. But the three patterns of development are still distinct choices even if the same long-term policy of mutual adjustment of data marts and data warehouses is followed after data warehouse development.

The top down pattern will require a period of substantial adjustment to data mart needs after the data warehouse is constructed, to moderate centripetal forces and to adjust to the inevitable development of partly autonomous data marts. The bottom-up model will require an extra stage of significant ETT processing to accommodate development of the data warehouse from the data marts. The parallel development model will require rapid development of an enterprise level data warehouse data model unless it is moderated to require only simultaneous development of data marts and the data warehouse, along with coordination from the enterprise team.

The parallel development model with feedback and less or no emphasis on a completed data warehouse data model prior to development, seems the indicated “rational” choice for a normative developmental pattern. But the “rational” choice for development is frequently not a choice that organisations can make. So, an important question is what will be the distribution of the different patterns of data mart/data warehouse development in organisations? First, none of the first three models will be represented. In neglecting user feedback, they ignore an essential empirical factor in the development process.

Second, of the alternative patterns, the top down pattern will apply to only a small percentage of enterprises, since it runs counter to the decentralising forces pervading organisations today. The bottom-up pattern will be popular. Especially if it is supplemented with some coordination from an enterprise-level CIO sponsored data modelling group. Then the worst effects of uncoordinated bottom-up development would be avoided, and the eventual data warehouse would faithfully incorporate the requirements of the data marts.

Finally, the parallel model will also be popular, because it provides for both coordination and autonomy. It will be still more popular, if it is moderated to require coordination of developing data models, rather than guidance from a completed enterprise level data model. If the bottom-up development pattern is supplemented with coordination from an enterprise level data modelling group, and the parallel model is moderated to abandon the requirement that the enterprise-level data model be completed before beginning data mart development, then the distinction between these two models will blur, and real world cases will only have minor differences in the degree of central coordination of data marts they require.


112

SummaryThe systems that contain operational data is the data that runs the daily transactions of our business contain • information that is useful to business analysts.

In data warehousing, we create stores of informational data that is extracted from the operational data and then • transformed for end-user decision making.

The Corporate Information Factory (CIF) is a logical architecture whose purpose is to deliver business intelligence • and business management capabilities driven by data provided from business operations.

The CIF has proven to be a stable and enduring technical architecture for any size enterprise desiring to build • strategic and tactical decision support systems (DSSs)

Producers are the first link in the information food chain.•

The ability or inability to capture the appropriate data in operational systems sets the stage for the value of the • Corporate Information Factory itself.

The CIF must provide facilities to define how corporate data relates to a customer, rules for integration and the • means to document these relationships/rules.

Integration and transformation consists of the processes to capture, integrate, transform, cleanse, reengineer and • load source data into the data warehouse or operational data store.

The greatest challenge for integration and transformation is when data is received from sources that have • organised data around different keys.

Before we begin the process of producing data to be loaded into the data warehouse, we should develop protocols • for configuration management and the scheduling of these processes.

The level of effort needed for integration and transformation is greatly affected by the level of understanding • we have of the source data.

The data warehouse (DW) is a subject- oriented, integrated, time variant (temporal) and non-volatile collection • of data used to support the strategic decision making process for the enterprise or business intelligence.

The operational data store (ODS) is a subject-oriented, integrated, current and volatile collection of data used • to support the tactical decision-making process for the enterprise or business management.

Data management is responsible for the ongoing management of data within and across the data warehouse • and operational data store.

Data delivery is a work group environment designed to allow end users (or their supporting IS group) to build • and manage views of the data warehouse within their data mart.

113

References Corporate Information Factory• , [Online] Available at: <people.stfx.ca/nfoshay/.../Corporate%20Information%20Factory.ppt> [Accessed 3 May 2012].

The Corporate Information Factory• , [Online] Available at: < http://www.information-management.com/issues/19991201/1667-1.html> [Accessed 3 May 2012].

Inmon, W., Imhoff, C. & Sousa, R., 2001. • Corporate Information Factory, 2nd ed. John Wiley & Sons Publication.

Simon, A. & Steven, S., 2001. • Data Warehousing and Business Intelligence For e-Commerce, Morgan Kaufmann Publication.

2006. Datawarehouse DW BI Introduction, [Video Online] Available at: < http://www.youtube.com/• watch?v=zDk4yPL6Adc> [Accessed 3 May 2012].

DATA WAREHOUSING Basics, 2009. [Video Online] Available at: <http://www.youtube.com/• watch?v=eiRhRxPuEU8> [Accessed 3 May 2012].

Recommended ReadingPonniah, P., 2011. • Data Warehousing Fundamentals for It Professionals, 2nd ed. John Wiley & Sons Publication.

Prabhu, C., S., R., 2004. • Data Warehousing: Concepts, Techniques, Products and Applications, 2nd ed. PHI Learning Pvt. Ltd.

Agnew, P., & Silverstone, L., 2009. • The Data Model Resource Book: Universal Patterns For Data Modeling, John Wiley & Sons Publication.


114

Self AssessmentThe Corporate Information Factory is a logical architecture whose purpose is to deliver __________and 1. _________capabilities.

business intelligence, business managementa. business analytics, customer relationshipb. business intelligence, business operationc. business management, customer relationd.

The _______consists of producers of data and consumers of information.2. DSSsa. CIFb. Data martc. APId.

__________and ________consists of the processes to capture, integrate, transform, cleanse, reengineer and 3. load source data into the data warehouse or operational data store.

Encapsulation, integrationa. Data manipulation, transformationb. Evaluation, data storagec. Integration, transformationd.

The__________ is a subject- oriented, integrated, time variant (temporal) and non-volatile collection of data.4. data marta. data warehouseb. memory storagec. data systemd.

The _________data store is relatively straightforward to deploy.5. relationala. functionalb. expandablec. operationald.

_____________is responsible for the application level partitioning and segmentation of the data warehouse.6. Data warehousea. Data martb. Data managementc. Data manipulationd.

Which of the following statements is false?7. A filtering mechanism removes all information that is not needed by the data mart process.a. Data delivery is a work group environment designed to allow end users to build and manage views of the b. data warehouse within their data mart.Data marts have both the business units and functional views of the data warehouse.c. The data mart may or may not be located on the same machine as the data warehouse.d.

115

_______is the catalyst (or messaging infrastructure) that provides the delivery and management of requests.8. CTIa. TrIb. CIFc. DSId.

_________provides the necessary details to promote data legibility, use and administration. 9. Meta dataa. Data martb. Data warehousec. Data systemd.

Which of the following are types of data mart?10. Rational and irrationala. Meta data and data warehousingb. Top down and bottom upc. Dependent and independentd.


116

Chapter VI

An Introduction to OLAP

Aim


introduce what is OLAP•

elucidate OLAP system components•

explain OLAP fundamentals•

Objectives


explain the types of OLAP and multidimensional OLAP •

explicate OLAP as a component of business intelligence•

elucidate strategies for increasing data storage•

Learning outcome


understand different ways of physically storing data•

differentiate between Essbase and Oracle OLAp•

distinguish between HOLAP, MOLAP and ROLAP•

117

6.1 IntroductionOLAP means many different things to different people, but the definitions usually involve the terms “cubes”, “multidimensional”, “slicing & dicing” and” speedy-response”. OLAP is all of these things and more, but it is also a misused & misunderstood term, in part because it covers such a broad range of subjects. We will discuss the above terms in later sections; to begin with, we explain the definition & origin of OLAP. OLAP is an acronym, standing for “On-Line Analytical Processing”. This, in itself, does not provide a very accurate description of OLAP, but it does distinguish it from OLTP or “On-Line Transactional Processing”. The term OLTP covers, as its name suggests, applications that work with transactional or “atomic” data, the individual records contained within a database.

OLTP applications usually just retrieve groups of records and present them to the end-user, for example, the list of computer software sold at a particular store during one day. These applications typically use relational databases, with a fact or data table containing individual transactions linked to Meta tables that store data about customers & product details. OLAP applications present the end user with information rather than just data. They make it easy for users to identify patterns or trends in the data very quickly, without the need for them to search through mountains of “raw” data. Typically this analysis is driven by the need to answer business questions such as “How are our sales doing this month in North America?” From these foundations, OLAP applications move into areas such as forecasting and data mining, allowing users to answer questions such as “What are our predicted costs for next year?” and “Show me our most successful salesman”.

OLAP applications differ from OLTP applications in the way that they store data, the way that they analyze data and the way that they present data to the end-user. It is these fundamental differences (described in the following sections) that allow OLAP applications to answer more sophisticated business questions.

6.2 Why do we need OLAP? When first investigating OLAP, it is easy to question the need for it. If an end user requires high-level information about their company, then that information can always be derived from the underlying transactional data, hence we can achieve every requirement with an OLTP application. Were this true, OLAP would not have become the important topic that it is today. OLAP exists & continues to expand in usage because there are limitations with the OLTP approach. The limits of OLTP applications are seen in three areas.

6.2.1 Increasing Data Storage The trend towards companies storing more & more data about their business shows no sign of stopping. Retrieving many thousands of records for immediate analysis is a time and resource consuming process, particularly when many users are using an application at the same time. Database engines that can quickly retrieve a few thousand records for half-a-dozen users struggle when forced to return the results of large queries to a thousand concurrent users.

Caching frequently requested data in temporary tables & data stores can relieve some of the symptoms, but only goes part of the way to solving the problem, particularly if each user requires a slightly different set of data. In a modern data warehouse where the required data might be spread across multiple tables, the complexity of the query may also cause time delays & require more system resources which means more money must be spent on database servers in order to keep up with user demands.

6.2.2 Data versus InformationBusiness users need both data and information. Users who make business decisions based on events that are happening need the information contained within their company’s data. A stock controller in a superstore might want the full list of all goods sold in order to check up on stock levels, but the manager might only want to know the amount of fruit & frozen goods being sold. Even more useful would be the trend of frozen good sales over the last three months.


118

In order to answer the question “How many frozen goods did we sell today?” an OLTP application must retrieve all of the frozen good sales for the day and then count them, presenting only the summarised information to the end-user. To make a comparison over three months, this procedure must be repeated for multiple days. Multiply the problem by several hundred stores, so that the managing director can see how the whole company is performing and it is easy to see that the problem requires considerable amounts of processing power to provide answers within the few seconds that a business user would be prepared to wait.

Database engines were not primarily designed to retrieve groups of records and then sum them together mathematically and they tend not to perform well when asked to do so. An OLTP application would always be able to provide the answers, but not in the typical few-seconds response times demanded by users. Caching results doesn’t help here either, because in order to be effective, every possible aggregation must be cached, or the benefit won’t always be realised. Caching on this scale would require enormous sets of temporary tables and enormous amounts of disk space to store them.

6.2.3 DataLayout The relational database model was designed for transactional processing and is not always the best way to store data when attempting to answer business questions such as “Sales of computers by region” or “Volume of credit-card transactions by month”. These types of queries require vast amounts of data to be retrieved & aggregated on-demand, something that will require time & system resources to achieve. More significantly, related queries such as “Product sales broken down by region” and “Regions broken down by product sales” require separate queries to be performed on the same data set.

The answer to the limitations of OLTP is not to spend more & more money on bigger & faster databases, but to use a different approach altogether to the problem and that approach is OLAP. OLAP applications store data in a different way from the traditional relational model, allowing them to work with data sets designed to serve greater numbers of users in parallel. Unlike databases, OLAP data stores are designed to work with aggregated data, allowing them to quickly answer high-level questions about a company’s data whilst still allowing users to access the original transactional data when required.

6.3 OLAP FundamentalsAs discussed, OLAP applications are used to solve everyday business questions such as “How many cars did we sell in Europe last month?” or “Are our North American stores throwing away more damaged goods year-on-year?” To answer these questions, large amounts of transactional or base data must be retrieved and then summed together. More subtly, they require a different approach to storing & retrieving the data.

Although different OLAP tools use different underlying technologies, they all attempt to present data using the same high-level concept of the multidimensional cube. Cubes are easy to understand, but there are fundamental differences between cubes and databases that can make them appear more complicated than they really are. This section sets out what cubes are, how they differ from databases and why they provide OLAP applications with more power than the relational database. Storing data in cubes introduces other new terms & concepts and these are all explained in this section.

119

6.4 What is a Cube? The cube is the conceptual design for the data store at the center of all OLAP applications. Although the underlying data might be stored using a number of different methods, the cube is the logical design by which the data is referenced. The easiest way to explain a cube is to compare storing data in a cube with storing it in a database table.

Products Store VolumeBulbs Uptown 40Bulbs Midtown 52Bulbs Downtown 36Batteries Uptown 104Batteries Midtown 22Batteries Downtown 78Fuses Uptown 56Fuses Midtown 31Fuses Downtown 58

Table 6.1 A relational table containing sales records.

The above table shows a set of sales records from three electrical stores displayed in a transactional database table. There are two field columns “Products” and “Store” that contain textual information about each data record and a third value column “Volume”. This type of table layout is often called a “fact table”. The columns in a table define the data stored in the table. The rows of textual information and numeric values are simply instances of data; each row is a single data point. A larger data set would appear as a table with a greater number of rows.

Fig. 6.1 Two-dimensional cube

The above figure shows the same data now arranged in a “cube”. The term “cube” is used somewhat loosely, as this is in fact a two-dimensional layout, often referred to as “a spreadsheet view” as it resembles a typical spreadsheet. The axes of the cube contain the identifiers from the field columns in the database table. Each axis in a cube is referred to as a “dimension”. In this cube, the horizontal dimension contains the product names and is referred to as the “Products dimension”. The vertical dimension contains the store names and is referred to as the “Store dimension”.


120

In the database table, a single row represents a single data point. In the cube, it is the intersection between fields that defines a data point. In this cube, the cell at the intersection of Fuses and Midtown contains the number of fuses sold at the midtown store (in this case, 31 boxes). There is no need to mention “Volume” as the whole cube contains volume data. This co-ordinate driven concept of finding data is the reason why we can’t just ignore one of the dimensions in a cube. For example, the question “How many bulbs did we sell?” has no direct meaning with this cube unless it is qualified by asking for data from a particular store.

The term “field” is used to refer to individual members of a dimension, so for example, Uptown is a field in the Store dimension. Notice that the two dimensions contain apparently unrelated fields. Dimensions are usually comprised of the same class of objects, in this example all of the products are in one dimension and all of the stores are in another. Attempting to mix fields between the two dimensions would not work because it would not make sense, it would not be possible to create a unique cell for each data point and any attempt to display the data would also not be possible.

Note that we have avoided using the terms row & column dimension. Although a cube appears to have rows & columns just like a table, they are very different from the rows & columns in a database. In a database, row & column refer to specific components of the data store, in a cube; they simply describe the way the cube is presenting the data. For example, the cube in above figure can also be displayed as in figure below, with the dimensions reversed.

Fig. 6.2 The two-dimensional cube reoriented

Both the above figures are valid layouts, the important point is that the first diagram shows “Products by Store” and the second shows “Stores by Product”. This is one of the advantages of the cube as a data storage object; data can be quickly rearranged to answer multiple business questions without the need to perform any new calculations. A second advantage is that the data could be sorted either vertically or horizontally, allowing the data to be sorted by store or product regardless of the cube’s orientation. From this simple two-dimensional cube, we can now explain some further concepts.

6.5 Multidimensionality In the previous section, we looked at a simple two-dimensional cube. Although useful, this cube is only slightly more sophisticated than a standard database table. The capabilities of a cube become more apparent when we extend the design into more dimensions. Multidimensionality is perhaps the most “feared” element of cube design as it is sometimes difficult to envisage. It is best explained by beginning with a three-dimensional example. Staying with the data set used in the previous section, we now bring in more data, in the form of revenue & cost figures. Fig. 6.1 and Fig. 6.2. Show the different ways that the new data could be stored in a table.

121

Measures spread out across multiple columns

Products Store Revenue Cost Volume

Bulbs Uptown $250 $115 40

Bulbs Midtown $325 $145 52

Bulbs Downtown $225 $155 36

Batteries Uptown $416 $255 104

Batteries Midtown $88 $55 22

Batteries Downtown $312 $180 78

Fuses Uptown $160 $95 60

A “degenerate” table layout

All values in a single column

Products Store Measures ValueBulbs Uptown Revenue $250

Bulbs Uptown Cost $115

Bulbs Uptown Volume 40

Bulbs Midtown Revenue $325

Bulbs Midtown Cost $145

Bulbs Midtown Volume 52

Bulbs Downtown Revenue $225

The “canonical” table layout

As can be seen, the degenerate layout results in a wider table with fewer rows while the canonical model results in a narrower table with more rows. Neither layout is particularly easy to read when viewed directly. The simplest OLAP layout is to create three separate two-dimensional cubes for each part of the data, one for the revenue figures, one for costs and one for volumes. While useful, this layout misses out on the advantages gained by combining the data into a three-dimensional cube. The three-dimensional cube is built very simply by laying the three separate two-dimensional “sheets” (the Volume, Cost & Revenue figures) on top of each other.


122

Fig.6.3. The three-dimensional cube

The three-dimensional layout becomes apparent as soon as the three layers are placed on top of each other. The third dimension, “Measures” is visible as the third axis of the cube, with each sheet corresponding to the relevant field (Volume, Cost or Revenue). The actual data points are located by using a co-ordinate method as before. In this example, each cell is a value for the revenue, cost or volume of a particular product sold in a particular store.

As before, the data can be reoriented & rearranged, but this time, more sophisticated data rearrangements can be made. For example the view from the right-hand face of the cube in snapshot above shows the revenue, cost & volume figures for all products sold in the Downtown store. The view from the topmost face shows the revenue, cost & volume figures for bulbs across all three stores. This ability to view different faces of a cube allows business questions such as “Best performing product in all stores” to be answered quickly by altering the layout of the data rather than performing any new calculations, thus resulting in a considerable performance improvement over the traditional relational database table method.

6.5.1 Four Dimensions and Beyond Although the word “cube” refers to a three-dimensional object, there is no reason why an OLAP cube should be restricted to three dimensions. Many OLAP applications use cube designs containing up to ten dimensions, but attempting to visualise a multidimensional cube can be very difficult. The first step is to understand why creating a cube with more than three dimensions is possible and what advantage it brings.

As we saw in the previous section, creating a three-dimensional cube was fairly straightforward, particularly as we had a data set that lent itself to a three-dimensional layout. Now imagine that we have several three-dimensional cubes, each one containing the same product, region & measures dimensions as before, but with each one holding data for a different day’s trading. How do we combine them? We could just add all of the matching numbers together to get a single three-dimensional cube, but then we could no longer refer to data for a particular month. We could extend one of the dimensions, for example the measures dimension could have the fields “Monday’s costs” and “Tuesday’s costs”, but this would not be an easy design to work with and would miss out on the advantages of a multidimensional layout.

123

The answer is simple, we create a fourth dimension, in this case the dimension “Days” and add it to the cube. Although we can’t easily draw such a cube, it is easy to prove the integrity of the design. As stated before, each data point is stored in a single cell that can be referred to uniquely. In our four-dimensional design, we can still point to a specific value, for example the value for revenue from bulbs sold Uptown on Monday. This is a four dimensional reference as it requires a field from four dimensions to locate it:

The Revenue field from the Measures dimension. •

The Bulbs field from the Product dimension.•

The Uptown field from the Store dimension.•

The Monday field from the Days dimension.•

Without actually having to draw or visualise the whole cube, it is quite easy to retrieve and work with a four-dimensional data set simply by thinking about the specific data cells being requested. The issue of visualising the data set leads onto the second step in picturing the cube. Although the cube might have four (or more) dimensions, most applications only present a two-dimensional view of their data. In order to view only two dimensions, the other dimensions must be “reduced”. This is a process similar to the concept of filtering when creating an SQL query.

Having designed a four-dimensional cube, a user might only want to see the original two-dimensional layout, Products by Store. In order to display this view, we have to do something to the remaining dimensions Measures & Days. It makes no sense just to discard them as they are used to locate the data. Instead, we pick a single measure & day field, allowing us to present a single two-dimensional view of the cube.

Fig. 6.4 Two-dimensional view of a four-dimensional structure

We have to pick a field from the remaining dimensions because we need to know from which Measures field and which day to retrieve the Product & Store information. The dimensions that don’t appear in the view are often referred to as “section” dimensions, as the required view is “sectioned” on a specific field from these dimensions. Although it is difficult to visualise at this point, it is the dimensions and the fields in those dimensions that define cubes, not the data stored in the cube. A table is often described by the number of columns & rows that it has, while the number of dimensions and the number of fields in each dimension define a cube.


124

6.5.2 “Slicing & Dicing” This is a phrase used to describe one part of the process used to retrieve & view data stored in an OLAP cube. Because of the size of most real-world OLAP cubes and their complexity, the process of locating the right set of data is often referred to under the heading “Navigation”. As data can only effectively be displayed in a two-dimensional format, the multi-dimensional cube must be restricted into flat “slices” of data. When picking a particular orientation of data in a cube, the user is literally “slicing & dicing” the data in order to view a simple flat layout.

6.5.3 Nested Dimensions Although data can only be viewed in a two-dimensional or flat layout, it doesn’t mean that only two dimensions can be fully displayed at one time. It is perfectly possible to display more than one dimension on each axis. For example, the user might want to see revenue & cost figures for all products sold in each store on Monday. Rather than displaying revenue & cost separately, the Measures dimension can be “nested” inside the Store dimension, displaying both revenue & cost data simultaneously and allowing direct comparison to be made between them. This layout can be seen in figure below.

Fig. 6.5 Measures dimension” nested” inside the Store dimension

6.6 Hierarchies & Groupings So far, we have only looked at simple dimensions, each containing only a few fields. Real world data sets can create dimensions with many thousands of fields and this introduces two problems. Firstly, dimensions with large numbers of fields can become difficult to manage, so an application requires a way of dividing the fields into groups. Secondly, many users don’t want the individual values for each field; they want the total value for all of the fields, or for a particular group. Fortunately, grouping fields to create “hierarchical” dimensions can solve both of these problems.

A hierarchical dimension derives its name from the concept of fields being at different levels, with some fields being at a higher level in a dimension than others. For example, a store might sell three types of bulbs, 60 Watt, 100 Watt and 150 Watt. Rather than always having to list data for each bulb type, a user might want to group the bulbs together as a method of simplifying the data view and so that they can answer questions such as “How many bulbs did we sell today?”

The method used by all OLAP applications is to create a “parent field” that groups together other fields. For example, following figure shows the parent field “Bulbs” and the child fields that comprise it. This parent field is at a higher level in the dimension and is often described as being at a higher hierarchical level.

125

Fig. 6.6 The bulbs hierarchy within the products dimension

Once the parent field has been defined, this automatically makes any fields below it into child fields. If the children of the parent field don’t have any child fields of their own, then they are referred to as the base fields in the dimension as there are no further levels below. This grouping together allows both problems to be solved. An application can now sum together data for the three bulb types and store it against the new field “Bulbs”. Without the new field, a cube has nowhere to store the values that are created by summing together the values for each individual bulb type. It also allows an application to simplify its layout by only presenting the parent fields to the user, typically a much smaller list than the full list of base fields.

Note that hierarchies are not restricted to single levels. For example a global computer retailer might begin by grouping stores together by cities, these cities would then sum into countries and the countries sum into continents. A top-level “World” field would realise the final global picture.

6.6.1 “Drill-down”, “Drill-up” & “Drill-across”These are common phrases used when dealing with OLAP applications and are very simple to explain. If an application only displayed the parent fields in a cube, such as data for the whole of the US and the user wanted to view data for a particular State, then breaking down the US parent field into its children is called “Drilling-down”. Conversely, “Drilling-up” refers to the process of selecting a child field and displaying its parent field. “Drill-across” describes any changes to the sectioned fields. For example, a user might want to change the display of January’s budget figures for each department figure to February’s. If the original view was sectioned on January, then the change to February is called “Drilling-across”.

6.7 Consolidated Data Another of the key OLAP descriptions is the concept of “speedy-response”, literally the time it takes for a user to receive the data that they require. As we have already discussed, although database engines can sum data, such as the total number of sales in a region, they struggle to do so. This is because they were not primarily designed to answer such questions and hence, don’t store data in a way that makes it efficient to answer them.

OLAP applications, by their very nature are required to provide aggregated or “consolidated” data. This is data derived by grouping together sets of related fields to provide higher-level information. As we have seen, OLAP applications create new fields to group together existing base fields into sensible, manageable groups (hierarchies). The high-level data points referenced by these hierarchical fields rarely exist in the original data sets, so the values must be evaluated by the application. The technical process of evaluating high-level values from base data is beyond the scope of this document, but the principle is as follows.


126

Fig. 6.7 A two-dimensional cube with consolidated data

Consider the two-dimensional cube in figure above. The values in the grey boxes are the original base data values and the values in the white cells are the extra high-level values that are created by the OLAP application. These parent cells usually represent the sum of their child fields’ values, so can be evaluated by adding together the data points from the level below. It is worth mentioning that the top-most cell in the structure (marked in bold) can be evaluated by summing the child fields from either the column or the row dimension if both dimension hierarchies are simple summations. Obviously it is quicker to use the calculation that requires the fewest values to be added. This is another advantage of OLAP data storage; the cube design allows considerable performance gains to be made when generating consolidated data.

Hierarchies that are more complicated than simple additions can also be represented in a dimension, but these are beyond the scope of this document.

6.7.1 Pre-consolidated versus On-DemandMultidimensional data stores allow OLAP applications to evaluate large amounts of high-level data very quickly, typically several orders of magnitude faster than a database. However, because OLAP data stores can often be so large, evaluating the full set of values can still take minutes or hours to complete. Usually, it is the requirements of the users (in terms of report response times) that drive the decision between the two methods of consolidation, working out the values in advance and storing them or working them out as required.

When application data sets were smaller and computers were less powerful, almost all OLAP applications took the pre-consolidated route. This meant that the OLAP cube would be built periodically (usually daily or weekly) to include recent changes in its figures and then stored somewhere for fast data retrieval.

The advantages of rapid retrieval of data & hence speedy reporting response times always outweighed the disadvantages of building a large multidimensional data cube periodically and storing it on disk. More recently, with faster processors and the emergence of ever larger data cubes, the pressures of disk storage have seen many applications return to the on-demand method, where high-level data is only evaluated as it is required by the users. This is a more complicated method of working, as it typically requires all relevant calculations to complete within seconds so that the users are not aware that the values that they requested did not exist a few seconds earlier.

127

Assuming that the performance issues can be kept under control, then the on-demand method offers the simple advantage of massively reduced storage requirements that translates directly into cost-savings on disk purchases. More sophisticated OLAP tools allow the application to pre store some of the consolidated data and leave other sections to be evaluated on-demand. This allows the application to get the best from both techniques. Ultimately, the choice rests on several factors, how much data is being stored, the power of the server on which the application runs and the requirements of the users. Once these factors have been taken into account, a sensible decision can be made on the method used to evaluate the high-level data.

6.7.2 Sparse Data While really beyond the scope of this document, it is worth briefly discussing the concepts behind sparse data. Sparse data is literally data with “holes” in it. Sparse cubes have gaps in their data where there are no values and are the most common type of cube. Most real world cubes are very sparse, while the remainder are very dense (very few holes). These “holes” occur because not all combinations of fields in the dimensions apply. Sparse data typically appears in applications that display information about products and stores. In a large retail organisation, not all of the company’s stores will sell every product in their range, so where a store name coincides with a product that it does not sell, the value simply remains empty (or null).

Relational Databases are also sparse, but do not appear to be when viewed directly. This is because a database table only shows the record that exist any data that doesn’t exist usually does not appear in the table, so the empty values are not readily apparent. Figure below shows our simple two-dimensional cube, but this time with some of the numbers removed. It is the presence of these empty cells that makes a sparse cube.

Fig. 6.8 A sparse two-dimensional cube

It is important to note that these empty cells are not zero values they are effectively holes in the data. For example, in figure above the empty value at the intersection of Bulbs and Downtown shows that the Downtown store does not sell bulbs. The ability to handle sparse data is an important part of the performance of an OLAP application. If the data cells don’t exist, then they should not occupy any space in the cube, although their location in the cube would still be valid. The concept that empty cells do not occupy any physical space is very important to modern OLAP applications because a large OLAP cube has the potential to be enormous! As an example, imagine a five-dimensional cube, where each dimension has ten thousand fields.

The maximum number of data points that could be stored in such a cube is derived by multiplying together the number of fields in each dimension, in this case resulting in a cube with 100,000,000,000,000,000,000 cells! Although an enormous number, this value is only the potential cell count, literally the maximum number of cells that could be stored in the cube. In reality, the actual cell count might be quite small, for example a few million, resulting in a manageable cube size because the unused data points require no physical storage space.

The concept of a maximum size for a cube exists because in a cube, the fields are all fixed, so they can only reference a finite number of cells. In a database, more fields can be added in the columns as required, resulting in a gradual growth in size of the table. As an analogy, a database table is like a balloon. It can expand and contract as data is added and deleted. A cube is more like a box, its shape is rigidly fixed and data can be added up to the maximum amount that will fit in the box.


128

6.8 Storing the DataROLAP, MOLAP and HOLAP these three terms refer to different ways of physically storing data that is held within an OLAP cube. Each method still attempts to present data as a cube, but uses different underlying technology to achieve the results.

6.8.1 ROLAP ROLAP Stands for “Relational OLAP”. This term describes OLAP applications that store all of the cube data, both base and high-level in relational tables. The application hides the presence of the tables by presenting the data in a cube layout. Vendors who only have a relational database engine available to them have to take this method of data storage. The multidimensional views are generated by combining base & aggregate data tables together with complicated (often multi-pass) SQL statements, often resulting in poor reporting performance combined with the difficulty of maintaining tables of aggregate data in order to improve reporting response times.

6.8.2 MOLAPMOLAP Stands for “Multidimensional OLAP”. This term describes OLAP applications that store all of the cube data, both base and high-level in proprietary multidimensional data files. The application copies the base data from the underlying table into a multidimensional data format (usually a binary data file) and then evaluates the consolidated values. The multidimensional data views are automatically present in this method and performance is often very quick, particularly if the cubes are small enough to fit into RAM. More typically, the data is stored in large disk files. The biggest drawback with this method is the duplication of base data that occurs when it is copied into the cube, requiring extra disk space and processing time.

6.8.3 HOLAP HOLAP Stands for “Hybrid OLAP”. This term describes OLAP applications that store high-level data in proprietary multidimensional data files, but leave the underlying base data in the original data tables. This method has the big advantage of not requiring duplication of the base data, resulting in time & disk space savings. The cube drives the multidimensional views, so the application requires a robust link between the multidimensional data file and the relational table that stores the base data beneath it.

6.9 OLAP as a Component of Business IntelligenceTo explain how OLAP technology contributes to business intelligence (BI), we first need to define BI itself. BI means different things to different people. For some people, BI is only the data warehouse. Others see BI as the dashboards on their desktops. In this book, we define BI as all of the processes and technologies used to help businesses make better decisions. BI includes the following:

Enterprise performance management•

Data warehousing•

Business reporting, including dashboards and scorecards•

Predictive analytics and data mining•

OLAP•

Together, these technologies support an organisation’s ability to create, maintain, analyze, and report accurate information about the business, and use that information for forward-facing activities such as budgeting and forecasting. The next sections define each of the technologies so that you can understand exactly what OLAP contributes to BI.

129

6.10 Enterprise Performance ManagementEnterprise performance management (EPM) is a set of processes and related software that supports management excellence. EPM organisations are smart, agile, and aligned. Smart organisations recognise that they must rationalise their analytical tools and data management systems to eliminate the noise and provide actionable insights to all the stakeholders of the enterprise.

Agile organisations are able to detect deviations between plans and execution quickly, find the root causes, and take fast corrective actions. They use best-of-breed technologies that offer advanced integration with operational systems, yet can be used easily with a company’s existing architecture and information technology (IT) investments. Aligned organisations address the needs of all stakeholders and share information through integrated systems and processes so that all stakeholders are working from the same set of facts that is, the same data.

6.11 Data WarehousingThe objective of a data warehousing system is to provide business users with a time based, integrated view of cross-functional data. To create a data warehouse, we start with data that may exist in different formats across several systems. We transform the data, cleanse it, and create an integrated view of the data. Data warehousing provides historical data, as opposed to the current snapshot of data that can be found in an online transaction processing (OLTP) system. A data warehouse does not answer the question “What orders are shipping now?” but rather reporting questions such as “How many orders did we ship last month?” and analytical questions such as “When have we shipped orders the fastest?”

A data warehouse offers a central, reliable repository of historical business data that all stakeholders can use. End users can write queries to pull data from this single source of data, so that regardless of who asks the question, they will get consistent answers.

6.12 Business ReportingBusiness reporting is about conveying information that is important to the organisation and using that data to manage the business. Business reports have been around since the first data management systems were implemented. The original medium of reports was paper documents. Today, many organisations implement business reports online through dashboards and scorecards. Business reports often require current data, and they can be widely distributed within an organisation.

6.13 Predictive Analytics and Data MiningPredictive analytics is concerned with examining historical data using statistical tools and techniques, such as regression or data mining, to forecast or predict future events and to determine the factors that best predict an event. For example, using historical data, a company could forecast a customer’s price point for a certain product. By determining each customer’s profile, the company could manage its revenue stream better by charging different customers different prices. This would allow the company to increase revenue while maintaining customer satisfaction. After these models are developed, analysts can look for exceptions to the model for activities such as anomaly and fraud detection.

6.14 OLAPOLAP is a technology that supports activities ranging from self-service reporting and analysis to purpose-built management applications such as planning and budgeting systems. What differentiates OLAP from regular business reporting is the analytics. In an OLAP application, metrics are often compared with a baseline, such as last year’s numbers or the performance of the whole United States. Over the course of this book, we describe OLAP technology in general and Oracle’s products for OLAP in particular. The next two sections provide a foundation upon which we can begin to build up our understanding of OLAP technology and OLAP products. We describe the benefits of OLAP, and then provide some basic information about OLAP systems and implementations.


130

6.14.1 Why OLAP?An effective OLAP solution solves problems for both business users and IT departments. For business users, it enables fast and intuitive access to centralised data and related calculations for the purposes of analysis and reporting. For IT, an OLAP solution enhances a data warehouse or other relational database with aggregate data and business calculations. In addition, by enabling business users to do their own analyses and reporting, OLAP systems reduce demands on IT resources. OLAP offers five key benefits:

Business-focused multidimensional data•

Business-focused calculations•

Trustworthy data and calculations•

Speed-of-thought analysis•

Flexible, self-service reporting•

The next sections describe each of these benefits of OLAP

6.15 Business-Focused Multidimensional DataAs mentioned in the first sentence of this chapter, OLAP uses a multidimensional approach to organise and analyze data. In a multidimensional approach, data is organised into dimensions, where a dimension reflects how business users typically think of the business. For example, business users may view their data by product, by market, and over time. Each of these is a dimension in an OLAP application. Note that business users instinctively refer to dimensions after prepositions such as by (by product/by market), over (over time), or across (across business units).

A dimension can be defined as a characteristic or an attribute of a data set. Each dimension contains members that share the common characteristic. The members are often organised hierarchically within the dimension. For example, Figure below contains a few dimensions and their members. The Time dimension, which represents a year, is divided into quarters, and each quarter into respective months. The Products dimension contains product groupings and then the individual products within each grouping. The Markets dimension demonstrates a division into geographic regions divided further into states.

The hierarchical aspect of the dimension represents the first option for aggregation. For example, Quarter 1 summarises the data for its child members January, February, and March. Time summarises the data for all four quarters in the year. The aggregations are inherent in the hierarchy. The metadata in an OLAP system contains the aggregation rules, freeing the application from needing to define these aggregation rules and ensuring that these rules are applied consistently for each report or analysis.

131

Fig. 6.9 Sample dimensions with members

We describe the multidimensional approach more fully in the next chapter. For now, it is enough to understand that OLAP organises data in a multidimensional model that makes it easy for business users to understand the data and to use it in a business context, such as a budget.

6.16 Business-Focused CalculationsOne reason OLAP systems are so fast is that they pre aggregate values that would need to be computed on the fly in a traditional relational database system. The calculation engine handles aggregating data as well as business calculations. In an OLAP system, the analytic capabilities are independent from how the data is presented. The analytic calculations are centrally stored in the metadata for the system, not in each report. Here are some examples of calculations available within an OLAP system:

Aggregations, which simply roll up values based upon levels organised in hierarchies. For example, the application • may roll up sales by week, month, quarter, and year.

Time-series calculations with time intelligence, such as percent difference from last year, moving averages, • and period-to-date values.

Matrix or simple intra dimensional calculations, such as share of parent or total, variances, or indexes. For those • readers used to spreadsheets, this type of calculation replaces embedded spreadsheet formulas.

Cross-dimensional or complex inter dimensional calculations, such as index of expenses for current country to • revenue for total United States. Someone using only spreadsheets would need to link spreadsheets and create formulas with values from different sheets to accomplish this type of calculation.

Procedural calculations, in which specific calculation rules are defined and executed in a specific order. For • example, allocating a shared expense, like advertising across products, as a percent of revenue contribution per product is a procedural calculation, requiring procedural logic to model and execute sophisticated business rules that accurately reflect the business.

OLAP-aware calculations, with specialised functions such as ranking and hierarchical relationships. These • calculations can include time intelligence and financial intelligence.

For example, an OLAP-aware calculation would calculate inventory balances in which quarter 1 ending inventory is understood to be the ending inventory of March, not the sum of January, February, and March inventories. User-defined expressions, allowing a user to combine previously defined calculations using any operators and multidimensional functions.


132

6.17 Trustworthy Data and CalculationsWhen electronic spreadsheets, such as VisiCalc and Lotus 1-2-3, were released in the late 1970s and early 1980s, business analysts, who were already familiar with paper-based spreadsheets, embraced these new tools. Analysts would create spreadsheets starting from raw data and spend hours formatting and massaging the data into a form they could use. They would develop dozens to hundreds of these sheets. In turn, their organisations began to rely on an inordinate number of these manually produced spreadsheets for extremely important information.

Unfortunately, as soon as data starts living in spreadsheets, users start changing the data, entering new data, and creating calculations to augment what is already there. Soon, there are multiple definitions of something as basic as sales or profit. The resulting confusion gave rise to a phenomenon that came to be known colloquially as “spreadsheet hell.” To get a sense of the depth of the problem caused by spreadsheet hell, consider the following scenario: There are ten people in a room, each with his own spreadsheet containing his own metrics, formulas, and numbers. None of the spreadsheets contains exactly the same data. It becomes exceedingly difficult, if not impossible, for management to make sound business decisions when no one can agree on the underlying facts.

The problem is not limited to just spreadsheets. Many organisations have multiple reporting systems, each with its own database. When data proliferates, it is difficult to ensure that the data is trustworthy. OLAP systems centralise data and calculations, ensuring a single source of data for all end users. Some OLAP systems centralise all data in a multidimensional database. Others centralise some data in a multidimensional database and link to data stored relationally. Still other OLAP systems are embedded in a data warehouse, storing data multidimensional within the database itself. Regardless of the implementation details, what is important is that OLAP systems ensure end users have access to consistently defined data and calculations to support BI.

6.18 Speed-of-Thought AnalysisSpeed-of-thought analysis also referred to as ad hoc analysis means that analysts can pose queries and get immediate responses from the OLAP system. Not needing to wait for data means fewer interruptions in the analyst’s train of thought. The analyst can immediately pose another query based on the results of the first query, then another query, and so on, leading the analyst on a journey of discovery. Fast response times, together with intuitive, multidimensional organisation of data, enable an analyst to think of and explore relationships that otherwise might be missed.

For example, consider a company that experiences a sudden increase in the number of customer complaints concerning late product shipments. In investigating the issue, the analyst drills down into the financial cube and discovers that profits are at a record high. Analyst then drills down on the average age of the company’s payable invoices to discover that the average age is growing at a very high rate. Finally, the analyst drills down into inventories and discovers that raw materials are at low levels. From this analysis, analyst can draw the conclusion that the finance officer started paying invoices late, which improved short-term cash flow and profits, but now the company’s vendors are upset and shipping later. Late shipments of raw materials translates into late products and an increasing number of related consumer complaints. Speed-of-thought analysis is a key component that enables this kind of drill-down investigative work across multiple functional areas.

OLAP systems respond much faster to end-user queries than do relational databases that do not capitalise on OLAP technology. Quick response times are possible because OLAP systems preaggregate data. Preaggregation means that there is no need for many time-consuming calculations when an end-user query is processed. In addition, OLAP systems are optimised for business calculations, so calculations take less time to execute. OLAP systems make the analysis process easy for analysts by supporting tools they already use. For example, many OLAP systems support commercial spreadsheet tools such as Microsoft Excel or offer their own spreadsheet interface.

133

6.19 Flexible, Self-Service ReportingThe best report designers and builders usually come from within the business community itself because they know what is needed. Enabling these people to create their own reports is a hallmark of an OLAP system. OLAP systems enable business users to query data and create reports using tools that are natural for them to use. Providing tools which are familiar to end users means that learning curve of the tools is reduced so that they can use the system more likely.

In addition to commercial and custom spreadsheet applications, OLAP systems support other front-end reporting tools that are designed with business users in mind. For example, they include user-friendly tools that enable report designers to create and publish web-based dashboards and interactive reports using live OLAP data. The consumers of interactive reports are often able to customise their view of the data.

When business users can build their own reports, it reduces the reliance on IT resources for generating reports. Without an OLAP system, IT departments are often called upon to create a multitude of materialised views and specialised reports for business users on demand. As with any application geared to business users, the front-end tools must be intuitive and flexible enough to be employed by casual users. That said, as with any new tool, people need to be trained on how to use these reporting facilities effectively. If end users deem the system too hard to use, they will not adopt it

6.20 OLAP System ComponentsIn describing the benefits of OLAP, we used the term OLAP system. An OLAP system is made up of the following four primary components:

6.20.1 ServerThe OLAP server hosts the multidimensional data storage and runs the calculation engine. An OLAP server can be a stand-alone server or embedded within a relational database. For example, Essbase can run on a stand-alone server. Oracle OLAP is contained within the Oracle Database. The latter part of this chapter describes similarities and differences between Essbase and Oracle OLAP.

6.20.2 Multidimensional storageOLAP data is stored multidimensional in constructs often referred to as cubes. A cube is a useful concept for explaining multidimensionality. Dimensions (such as products, markets, and time) form the edges of the cube. Members from each dimension create intersections within the cube, each of which can potentially hold a data value. Depending on how an OLAP system is implemented, cubes can be stand-alone multidimensional databases or data objects within a relational database.

6.20.3 Calculation engineThe OLAP engine handles aggregation of data and optimises business calculations. Calculations are centrally stored in the metadata for the system, rather than in specific reports or applications.

6.20.4 Front-end analysis and reporting toolsFront-end analysis and reporting tools communicate with the OLAP server and present multidimensional data to the end user. As mentioned earlier in this chapter, OLAP systems support user-friendly tools for analysis and reporting, including commercial and custom spreadsheet applications and functions for creating web-based dashboards and interactive reports.

6.21 OLAP TypesThree main types of OLAP are available: multidimensional OLAP, relational OLAP, and hybrid OLAP. To help you understand where Oracle’s OLAP solutions fit into this spectrum, we will briefly describe each type.


134

6.21.1 Multidimensional OLAPWith multidimensional OLAP (MOLAP), the data is stored in a multidimensional data store. Both Essbase and Oracle OLAP use MOLAP technology. Essbase stores data in a multidimensional database. Oracle OLAP cubes are multidimensional objects stored in the Oracle Database.

MOLAP cubes are automatically indexed based on the dimensions. Data can be located using offset addressing. To find a given value in a multidimensional array, a MOLAP product needs to use only multiplication and addition, and computers do those operations very fast. MOLAP technology is the best option for dense arrays, where most of the data cells in a cube contain a value. That said, both Essbase and Oracle OLAP have capabilities to manage sparse MOLAP cubes effectively. Figure 6.10 summarises MOLAP cube advantages and challenges.

6.21.2 Relational OLAPRelational OLAP (ROLAP) uses a traditional star/snowflake schema and relational data sources only. With ROLAP, data is neither aggregated nor manipulated. The data is stored in relational tables that can be queried by SQL. ROLAP is ideal for lower density (sparse) cubes. ROLAP automatically provides all of the advantages of a relational database, such as high availability, replication, read consistent view of data, backup and recovery, parallel processing, and job scheduling.

ǀ 0

00

0

00

00

ǀ

ǀǀ

ǀǀ

ǀǀ

ǀǀ

MOLAP - MULTIDIMEnSIOnAL OLAP

Advantages: Very Fast Once Consolidated Memory Access Good For Dense Arrays

Challenges: Data Latency

Fig. 6.10 MOLAP advantages and challenges

135

ROLAP - RELATIOnAL OLAP

Advantages: Good For Sparse Arrays Adds Database Features

Challenges: Resources

Fig. 6.11 ROLAP advantages and challenges

Above figure summarises the advantages and challenges of ROLAP.

6.21.3 Hybrid OLAPWith hybrid OLAP (HOLAP), the data is stored both in an OLAP data store and a relational database. For example, we may have summary-level data stored in the OLAP data store and detailed data stored in the relational database. We could then drill down from the OLAP data store to the detail stored in the relational database. Today, most OLAP products support the hybrid architecture. Both Essbase and Oracle OLAP can be implemented in this fashion. Figure given below summarises the advantages and challenges of HOLAP.

ROLAP - RELATIOnAL OLAP

Advantages: Good For Sparse Arrays Adds Database Features

Challenges: Resources

Fig. 6.12 HOLAP advantages and challenges

One new extension of HOLAP is called extended OLAP (XOLAP). With XOLAP, you can model metadata such as database outlines and hierarchies in the MOLAP product; however, the data comes from relational sources. Essbase supports XOLAP.


136

6.22 OLAP ProductsThere are many different types of OLAP products, each of which seeks to provide solutions to certain problems and to meet the needs of particular user communities. While all OLAP products share the ability to support business users with a highly interactive user experience, they can differ significantly in terms of that user experience, performance, analytic capabilities, target audiences, and architecture.

For example, some OLAP products provide a dimensional query model for data stored in relational tables in a way that makes it easier for users to define their own queries and navigate data interactively. Other OLAP products take a fundamentally different approach by tightly coupling data needed with the dimensional model for fast access to the data. This kind of OLAP product differs from one that also provides performance benefits and rich analytical capabilities, and is very different from an OLAP product that is designed to support, for example, a planning and budgeting application.

6.22.1 OLAP with a Data WarehouseIf we already have a data warehouse in place, we can leverage that investment by implementing an OLAP system within or alongside the data warehouse to support BI and performance management activities. Often, a finer level of granularity exists in the data warehouse than in the OLAP system. For example, many of today’s implementations are HOLAP systems, where the data warehouse stores the detail data and the OLAP system stores summaries. The OLAP system has ways to allow a user to drill down to detailed data in the data warehouse.

When we implement a middle-tier OLAP system with a data warehouse, data flows from the data warehouse to the OLAP cubes. This is important because the data values in the cubes need to match those in the data warehouse. If we performed all of the data-integration steps for the OLAP system from the original data sources rather than the data warehouse, we would run the risk of the data warehouse and the OLAP environment having two slightly different versions of the data. This could lead to inaccurate analyses and errors.

When we implement a database-centric OLAP system, OLAP data is stored in cubes within the data warehouse itself. The cubes are data objects that can be treated like any other data objects. Connections between summary data and detailed data can be handled by joining a cube to a table. SQL statements that normally would access a large fact table can be automatically rewritten to access cube-organised views. This greatly increases the performance of the system. Often, a single cube-organised materialised view can replace many table-based materialised views, easing maintenance of the data warehouse.

6.23 Typical OLAP ApplicationsOLAP has been used successfully in a wide variety of applications, including the following:

Analyzing financial data•

Budgeting and planning•

Forecasting•

Replacing manual spreadsheets•

Accelerating a data warehouse•

Enhancing an enterprise resource planning (ERP) system•

Replacing custom SQL reports•

Why Two OLAP Products from Oracle?With the acquisition of Hyperion Solutions Corporation in 2007, Oracle now owns the two most capable OLAP products on the market: Essbase and Oracle OLAP. While both products fall within the OLAP category and have some similar capabilities, they are different in significant ways. One purpose of this chapter is to show how the products are the same and how they differ, so that we can choose the solution that best suits our environment.

137

6.24 Similarities between Essbase and Oracle OLAPBoth Oracle OLAP and Essbase have the capability of storing data in OLAP cubes. As such, they share the following capabilities:

Excellent performance for queries that require summary-level data.•

Fast, incremental update of data sets, which is required to facilitate frequent data updates.•

Rich calculation models that may be used to enrich analytic content.•

A dimensional model that presents data in a form that is easy for business users to query and define analytic • content.

Because both Essbase and Oracle OLAP provide these core capabilities, it might seem like they are similar enough to be interchangeable. This is not the case. Each product focuses on delivering OLAP capabilities into different types of applications and for different classes of users.

6.25 Differences Between Essbase and Oracle OLAPEssbase and Oracle OLAP are two of the leading OLAP solutions. However, the products have taken different paths based on the product strategies of Hyperion and Oracle and the roles that each product fulfills. From the mid-1990s to 2007, Hyperion focused on building solutions for the middle tier. Oracle spent the same period embedding an OLAP engine into its world-class database. Most of the differences between Essbase and Oracle OLAP derive from the fact that Essbase is a separate process, while Oracle OLAP is an option to the Oracle Database Enterprise Edition.

6.26 Essbase: Separate-Server OLAPEssbase comes from a history of OLAP applications based in the middle tier. The strategy of Essbase centers on custom analytics and BI applications with a focus on EPM. This strategy addresses the what-if, modelling, and future-oriented questions that companies need answered today in order to see into the future. Typically, Essbase applications are started and maintained by business analysts. The buyer is usually in the line of business. The typical end users are line-of business users, such as analysts in the finance, marketing, and sales departments, who query and create data with Essbase tools and Oracle Hyperion applications.

The line of business typically has a large degree of uncertainty and needs to understand a dynamic and changing environment. Essbase is the OLAP server that provides an environment for rapidly developing custom analytic and EPM applications. The data management strategy allows Essbase to easily combine data from a wide variety of data sources, including the Oracle Database. Essbase is part of the Oracle Fusion Middleware architecture.

6.27 Oracle OLAP: Database-Centric OLAPOracle OLAP is available as an option to the Oracle Database Enterprise Edition. As an embedded component of the Oracle Database, Oracle OLAP benefits from the scalability, high availability, job scheduling, parallel processing, and security features inherent in the Oracle Database. With Oracle OLAP, all of the data resides in an Oracle database, governed by centralised data security and calculation rules.

An SQL interface to OLAP cubes allows SQL-based applications to query cubes within an Oracle database, and benefit from the performance and analytic content of the OLAP option. The primary data-access language for Oracle OLAP is SQL, making Oracle OLAP a natural choice for enhancing the performance and calculation capabilities of an existing Oracle data warehouse.


138

SummaryThe term OLTP covers, as its name suggests, applications that work with transactional or “atomic” data, the • individual records contained within a database.

OLAP applications differ from OLTP applications in the way that they store data, the way that they analyze data • and the way that they present data to the end-user.

OLAP applications store data in a different way from the traditional relational model, allowing them to work • with data sets designed to serve greater numbers of users in parallel.

Although different OLAP tools use different underlying technologies, they all attempt to present data using the • same high-level concept of the multidimensional cube.

Multidimensionality is perhaps the most “feared” element of cube design as it is sometimes difficult to • envisage.

The simplest OLAP layout is to create three separate two-dimensional cubes for each part of the data, one for • the revenue figures, one for costs and one for volumes.

As data can only effectively be displayed in a two-dimensional format, the multi-dimensional cube must be • restricted into flat “slices” of data.

It is perfectly possible to display more than one dimension on each axis.•

The method used by all OLAP applications is to create a “parent field” that groups together other fields.•

Multidimensional data stores allow OLAP applications to evaluate large amounts of high-level data very quickly, • typically several orders of magnitude faster than a database.

More sophisticated OLAP tools allow the application to pre store some of the consolidated data and leave other • sections to be evaluated on-demand.

Relational Databases are also sparse, but do not appear to be when viewed directly.•

The ability to handle sparse data is an important part of the performance of an OLAP application.•

The concept of a maximum size for a cube exists because in a cube, the fields are all fixed, so they can only • reference a finite number of cells.

ROLAP Stands for “Relational OLAP”. This term describes OLAP applications that store all of the cube data, • both base and high-level in relational tables.

MOLAP Stands for “Multidimensional OLAP”. This term describes OLAP applications that store all of the • cube data, both base and high-level in proprietary multidimensional data files.

HOLAP Stands for “Hybrid OLAP”. This term describes OLAP applications that store high-level data in • proprietary multidimensional data files, but leave the underlying base data in the original data tables.

The objective of a data warehousing system is to provide business users with a time based, integrated view of • cross-functional data.

References Schrader, M., Vlamis, D., Nader, M., Collins, D., Claterbos, C., Campbell, M. & Conrad, F., 2009. • Oracle Essbase & Oracle OLAP: TheGuide toOracle’sMultidimensionalSolution, McGraw-Hill Prof Med/Tech Publication.

Paredes, J., 2009. • The Multidimensional Data Modeling Toolkit: Making Your Business Intelligence Application, John Paredes Publication.

Introduction to OLAP (A beginner’s guide to OLAP & the concepts behind it)• , [Online] Available at: <http://resources.businessobjects.com/support/communitycs/TechnicalPapers/si_intro_olap.pdf > [Accessed 3 May 2012].

139

CHAPTER 1 Introduction to OLAP• , [pdf] Available at: <http://www.mhprofessional.com/downloads/products/0071621822/0071621822%20_Chap01.pdf> [Accessed 3 May 2012].

2008. • Lecture - 30 Introduction to Data Warehousing and OLAP, [Video Online] Available at: <http://www.youtube.com/watch?v=m-aKj5ovDfg> [Accessed 3 May 2012].

Jared, H., 2011. • What is OLAP?, [Video Online] Available at: <http://www.youtube.com/watch?v=2ryG3Jy6eIY> [Accessed 3 May 2012].

Recommended ReadingBerson, 2004. • Data Warehousing, Data Mining, & Olap, Tata McGraw-Hill Education Publication.

Thomson, E., 2002. • Olap Solutions: Building Multidimensional Information Systems, 2nd ed. Wiley Publication.

Pujari, A., 2001. • Data Mining Techniques, 4th ed. Universities Press Publication.


140

Self AssessmentWhich of the following statements is false?1.

OLAP cube is not restricted to three dimensions.a. Many OLAP applications use cube designs containing up to ten dimensions.b. Uptown is a field in the Store dimension.c. The method used by all OLAP applications is to create a parent field.d.

_________describes OLAP applications that store all of the cube data, both base and high-level in relational 2. tables.

MOLAPa. HOLAPb. ROLAPc. OLAPd.

_________describes OLAP applications that store all of the cube data, both base and high-level in proprietary 3. multidimensional data files.

MOLAPa. HOLAPb. ROLAPc. OLAPd.

___________is about conveying information that is important to the organisation and using that data to manage 4. the business.

Business intelligencea. Business reportingb. Business managementc. Business handlingd.

A _________can be defined as a characteristic or an attribute of a data set.5. multidimensional dataa. analysisb. reportingc. dimensiond.

OLAP uses a __________approach to organise and analyze data.6. single dimensionala. two-dimensionalb. three-dimensionalc. multidimensionald.

The ___________hosts the multidimensional data storage and runs the calculation engine.7. OLAP servera. OLAPb. Essabasec. Oracled.

141

Which of the following is not an OLAP type?8. Multidimensionala. Rationalb. Relationalc. Hybridd.

_________cubes are automatically indexed based on the dimensions.9. HOLAPa. ROLAPb. MOLAPc. OLAPd.

_________OLAP uses a traditional star/snowflake schema and relational data sources only.10. Relationala. Multidimensionalb. Hybridc. Rationald.


142

Chapter VII

Decision Support Systems

Aim


elucidate decision support system for business intelligence•

introduce model-base management system•

explain architecture of decision support systems•

Objectives


explain user interfaces to decision support systems•

explicate knowledge-based decision support systems•

elucidate decision support system’s design•

Learning outcome


explain implementation of decision support system•

distinguish between types of decisions •

describe decision support system for organisation•

143

7.1 IntroductionMaking decisions concerning complex systems for example, the management of organisational operations, industrial processes, or investment portfolios; the command and control of military units; or the control of nuclear power plants often strains our cognitive capabilities. Even though individual interactions among a system’s variables may be well understood, predicting how the system will react to an external manipulation such as a policy decision is often difficult. What will be, for example, the effect of introducing the third shift on a factory floor? One might expect that this will increase the plant’s output by roughly 50 percent. Factors such as additional wages, machine wear down, maintenance breaks, raw material usage, supply logistics, and future demand need also be considered; however, as they all will impact the total financial outcome of this decision.

Many variables are involved in complex and often subtle interdependencies and predicting the total outcome may be daunting. There is a substantial amount of empirical evidence that human intuitive judgment and decision making can be far from optimal, and it deteriorates even further with complexity and stress. Because in many situations the quality of decisions is important, aiding the deficiencies of human judgment and decision making has been a major focus of science throughout history. Disciplines such as statistics, economics, and operations research developed various methods for making rational choices.

More recently, these methods, often enhanced by a variety of techniques originating from information science, cognitive psychology, and artificial intelligence, have been implemented in the form of computer programs, either as stand-alone tools or as integrated computing environments for complex decision making. Such environments are often given the common name of decision support systems (DSSs). The concept of DSS is extremely broad, and its definitions vary, depending on the author’s point of view. To avoid exclusion of any of the existing types of DSSs, we will define them roughly as interactive computer-based systems that aid users in judgment and choice activities.

An- other name sometimes used as a synonym for DSS is knowledge-based systems, which refers to their attempt to formalise domain knowledge so that it is amenable to mechanised reasoning. Decision support systems are gaining an increased popularity in various domains, including business, engineering, the military, and medicine. They are especially valuable in situations in which the amount of available information is prohibitive for the intuition of an unaided human decision maker and in which precision and optimality are of importance.

Decision support systems can aid human cognitive deficiencies by integrating various sources of information, providing intelligent access to relevant knowledge, and aiding the process of structuring decisions. They can also support choice among well-defined alternatives and build on formal approaches, such as the methods of engineering economics, operations research, statistics, and decision theory. They can also employ artificial intelligence methods to address heuristically problems that are intractable by formal techniques. Proper application of decision-making tools increases productivity, efficiency, and effectiveness and gives many businesses a comparative advantage over their competitors, allowing them to make optimal choices for technological processes and their parameters, planning business operations, logistics, or investments.

While it is difficult to overestimate the importance of various computer-based tools that are relevant to decision making for example, databases, planning software, and spreadsheets, this article focuses primarily on the core of a DSS, the part that directly supports modelling decision problems and identifies best alternatives. We will brief discuss the characteristics of decision problems and how decision making can be supported by computer programs. We then cover various components of DSSs and the role that they play in decision support. We will also introduce an emergent class of normative systems that is DSSs based on sound theoretical principles, and in particular, decision- analytic DSSs. Finally, we will review issues related to user interfaces to DSSs and stress the importance of user interfaces to the ultimate quality of decisions aided by computer programs.


144

7.2 Evolution of Decision Support SystemsSince the first electronic general-purpose computer was put into full operation in the early 1940s, data-processing techniques have been continuously advancing. It was in the late 1950s that many organisations began to utilise transaction processing systems (TPS) or electronic data processing (EDP) systems to automate routine clerical tasks such as payroll, inventory and billing. In the 1960s, we witnessed the emergence of management information systems (MIS) with the development of database management systems for collecting; organising, storing and retrieving data.

MIS were developed to extract valuable management information by aggregating and summarising massive amounts of transaction data and allowing user-interactive managerial queries. The inclusion of simple modelling and statistical methods as a component of MIS permits computer systems to make routine (structured) decisions. It was not until 1970 that scholars began to recognise the important roles computer-based information systems (CBIS) play in supporting managers in their semi-structured or unstructured decision-making activities.

Since the 1970s, study of DSS has become an essential part of CBIS. In the 1980s, we witnessed another wave of information technologies, the artificial intelligence-based expert systems (ES), which are to replace and mimic human decision makers in making repetitive decisions in a narrow domain; knowledge-based systems. During the mid 1980s, executive information systems (EIS) emerged as an important tool to serve the information needs of executives. EIS provides timely and critical information which has been filtered and compressed for tracking and control purposes. The latest addition to CBIS is artificial neural networks (ANN).

Neural network computing involves building intelligent systems to mimic human brain functions. ANN attempt to achieve knowledge processing based on the parallel processing method of human brains, pattern recognition based on experience, and fast retrieval of massive amounts of data. Fuzzy logic, genetic algorithm, and intelligent agents are some of other intelligent techniques that can be used along with neural networks to improve the effectiveness of personal, group, and organisational decision making.

Table 1 summarises an evolutionary pattern of CBIS and shows the focus of a CBIS from data and information to knowledge and wisdom. The critical information provided by EIS can be used to identify various symptoms of malfunctioning organisational activities in each functional department. These symptoms can be the basis of diagnosing managerial problems. Decision support systems (DSS) are human–computer decision-making systems to support managerial judgements, and intuitions to solve managerial problems by providing necessary information, generating, evaluating and suggesting decision alternatives. Most organisational problems need a combination of quantitative and qualitative data processing. EIS are to deal with those organisational problems that can be better solved by qualitative data processing. Other subsets of CBIS such as TPS and MIS provide data into DSS to be processed by DSS models and managerial judgements.

145

Technology analogy Management Metaphor

Data EDP Elements: H2O, yeast bacteria, starch mol-ecules

Mudding through KNOW-NOTHING

Information MIS Ingredients: Flour, sugar, spices, fixed recipe for bread only (OR/MS) type

Efficiency (Mea-surement+ Search)

KNOW-HOW

Knowledge DSS, ESS, AI Choose among differ-ent recipes for bread

Effectiveness (de-cision making)

KNOW-WHAT

Wisdom HSM, MSS Why bread and not croissant

Explicability (judgment)

KNOW-WHY

Table 7.1 Taxonomy of knowledge

7.3 Definition of Decision Support SystemsDrawing on various definitions that have been suggested a DSS can be described as a computer-based interactive human–computer decision-making system that:

supports decision makers rather than replaces them•

utilises data and models•

solves problems with varying degrees of structure: (a) non-structured (unstructured or ill-structured) (Bonczeket • al. 1981); (b) semi-structured (Keen and Scott-Morton 1978); (c) semi-structured and unstructured (Sprague and Carlson 1982)

focuses on effectiveness rather than efficiency in decision processes (facilitating decision processes)•

7.4 Architecture of Decision Support SystemsAs shown in figure below, a DSS consists of two major sub-systems human decision makers and computer systems. Interpreting a DSS as only a computer hardware and software system is a common misconception. An unstructured (or semi-structured) decision by definition cannot be programmed because its precise nature and structure are elusive and complex (Simon 1960). The function of a human decision maker as a component of DSS is not to enter data to build a database, but to exercise judgment or intuition throughout the entire decision making process.

Imagine a manager who has to make a five-year production planning decision. The first step of the decision making process begins with the creation of a decision support model, using an integrated DSS program (DSS generator) such as Microsoft Excel, Lotus 1-2-3, Interactive Financial Planning Systems (IFPS)/Personal or Express/PC. The user interface sub-system (or dialogue generation and management systems) is the gateway to both database management systems (DBMS) and model-based management systems (MBMS). DBMS are a set of computer programs that create and manage the database, as well as control access to the data stored within it.


146

The DBMS can be either an independent program or embedded within a DSS generator to allow users to create a database file that is to be used as an input to the DSS. MBMS is a set of computer programs embedded within a DSS generator that allows users to create, edit, update, and/or delete a model. Users create models and associated database files to make specific decisions. The created models and databases are stored in the model base and database in the direct access storage devices such as hard disks. From a user’s viewpoint, the user interface subsystem is the only part of DSS components with which they have to deal.

Therefore, providing an effective user interface must take several important issues into consideration, including choice of input and output devices, screen design, use of colours, data and information presentation format, use of different interface styles, etc. Today’s decision support system generator provide the user with a wide variety of interface modes (styles): menu based interaction mode, command language style, questions and answers, form interaction, natural language processing based dialogue, and graphical user interface (GUI). GUIs use icons, buttons, pull-down menus, bars, and boxes extensively and have become the most widely implemented and versatile type. The interface system allows users access to:

The data sub-system• Database �Database management software �

The model sub-system• Model base �Model base management software �

The figure below shows the components of decision support system.

Fig. 7.1 Components of decision support system(Source: http://cstl-hcb.semo.edu/eom/iebmdssrwweb.pdf)

147

DSS is distinguished from MIS in terms of focusing on effectiveness, rather than efficiency in decision processes (facilitating decision processes). An important performance objective of DSS is to support all phases of the decision-making process (Sprague and Carlson 1982). Simon’s model of decision making describes human decision making as having three major steps: intelligence, design and choice. The term ‘support’ implies many different activities and tasks in each stage of the decision-making process.

In the intelligence stage, human decision makers play an important role in defining problems to be solved, based on the raw data obtained and information processed by transaction processing systems (TPS)/management information systems (MIS). Alter (1980: 73) suggests seven different types of DSS, based on the ‘degree of action implication of DSS outputs’ that is, the degree to which the DSS’s output could directly determine the decision. Among them, the following three DSS types are especially useful in the intelligence stage:

File drawer systems which allow online access only to particular data items.•

Data analysis systems which permit user(s) to retrieve, manipulate and display current and historical data.•

Analysis information systems which manipulate the internal data from TPS and augment the internal data with • external data using statistical packages and other small models to generate management information.

The majority of DSS in use today are developed to generate and evaluate decision alternatives via ‘what-if’ analysis and ‘goal-seeking’ analysis in the design and choice stages. Accounting models facilitate planning by calculating the consequences of planned actions on estimate-of-income statements, balance sheets and other financial statements. Representational models estimate the future consequences of actions on the basis of partially non-definitional models, including all simulation models. Optimisation models generate the optimal solutions. Suggestion models lead to a specific suggested decision for a fairly structured task. Such systems perform mechanical calculations and leave little role for managerial judgement.

7.5 Decision Support System Sub-SpecialitiesAs shown in figure below, the study of DSS consists of the following three important groups of research areas:

Developing a specific DSS (labelled ‘A’ in Figure). Over the past three decades (1970-2000), about 500 specific • functional DSS applications have been developed and published in English language journals (labelled ‘B’).

Developing DSS theory:• developing theory on decision makers, data, model and interface (dialogue) (labelled ‘F’–’I’) �developing theory on design, implementation and evaluation (labelled ‘C’, ‘D’ and ‘E’) �

Study of contributing disciplines (labelled ‘J’).•

The first group of research areas, labelled ‘F’–’I’, is based on the architecture of DSS heavily influenced by Sprague and Carlson while the second group of research areas, labelled as ‘C’–’E’, is influenced by the organisational perspectives of Keen and Scott-Morton (1978). The third group of research is DSS application development, labelled ‘A’ and ‘B’.


148

Fig. 7.2 Theory applications and contributing disciplines of decision support system(Source: http://cstl-hcb.semo.edu/eom/iebmdssrwweb.pdf)

7.6 Data/Model ManagementSince model and data management in DSS are inseparable subjects, many DSS researchers continue to focus on both fields of data and model management. Data are facts which result from the observation of physical phenomena such as daily production quantity, daily sales quantity and inventory level of product A. A database is a collection of interrelated files. Database management systems are computer programs which are primarily concerned with managing a large amount of data in a physical storage such as hard disks and creating, updating and querying databases in an optimal way.

Data management in DSS is a necessary function primarily useful in the intelligence stage of the decision making process, but not sufficient to support design and choice stages of decision-making processes. To adequately support these stages, DSS should be able to include the following activities: projection, deduction, analysis, creation of alternatives, and comparison of alternatives, optimisation and simulation (Sprague and Carlson 1982).

In performing these essential tasks, DSS utilises many types of management science/operations research (MS/OR) models. They include linear programming, integer programming, network models, goal programming, simulation and statistical models and spreadsheet modelling. All these models are stored in the model base. Model-based management systems are computer programs used as a part of a DSS generator to build models, restructure models and update models. In association with model management, multiple criteria decision making (MCDM) model embedded DSS and knowledge-based DSS have emerged recently as important DSS research sub-specialities (Eom and Min 1992; Eom 1996).

149

7.7 User Interface Sub-SystemsThe functions of the user interface (dialogue generation and management) sub-system are to:

Allow the user to create, update, delete database files and decision models via database management systems • and model-based management systems.

Provide a variety of input and output formats, the formats include multi-dimensional colour graphics, tables • and multiple windows in a screen.

Provide different styles of dialogues (such as graphical user interfaces, menus, direct command languages, form • interaction, natural language interaction, and questions and answers).

Research in user interface sub-systems has investigated several important issues in the designing, building, and implementing of a user interface. They include data /information display formats (for example, tabular versus graphics), cognitive and psychological factors, use of multimedia (multiple media combined in one application) and hypermedia (documents that contain several types of media linked by association), 3- dimensional user interfaces, virtual reality and its impact on decision making, geographical information systems, and natural language processing.

7.8 Knowledge-Based Decision Support SystemsAnother important emerging DSS sub-speciality is the study of knowledge-based decision support systems (KBDSS), which are hybrid systems of DSS and ES that help solve a broad range of organisational problems. In integrating DSS and ES, two basic approaches are discernible and labelled expert support systems (ESS) and intelligent support systems (ISS) (King 1993). The key differences between these two systems are as follows.

ESS is to replace human expertise with machine expertise, while ISS are to amplify the memory and intelligence of humans and groups (King 1993). A broad range of real-world managerial problems can be better solved by using the analysis of both quantitative and qualitative data. Few would disagree with the notion that there are considerable benefits from integrating DSS and ES. The new integrated system (ESS or ISS) can support decision makers by harnessing the expertise of key organisational members. A bottleneck in the development of knowledge-based systems such as ESS is knowledge acquisition, which is a part of knowledge engineering the process includes representation, validation, inference, explanation and maintenance.

7.9 Group DSS/Group Support Systems/Electronic Meeting SystemsSingle user DSS and group DSS can be distinguished in many different ways in terms of purpose and components (hardware, software, people, procedures). First, group DSS and single user DSS have distinguishable purposes. DeSanctis and Gallupe (1985: 3) define a group DSS as ‘an interactive computer-based system which facilitates solution of unstructured problems by a set of decision makers working together as a group’. A single user DSS can be simply defined by replacing ‘a set of decision makers working together as a group’ with ‘a decision maker’.

Second, to support a set of decision makers working together as a group, group DSS have special technological requirements of hardware, software, people and procedures. Each member of the group usually has a personal computer which is linked to the personal computers of other group members and to one or more large public viewing screens, so that each member can see the inputs of other members or let other members see their work. Group DSS software also need special functional capabilities, in addition to the capabilities of single user DSS software, such as anonymous input of the user’s ideas, listing group members’ ideas, voting and ranking decision alternatives. The people component of group DSS should include a group facilitator, who leads the session by serving as the interface between the group and the computer systems.


150

Computer based information systems to support group activities have been conducted under the titles of group decision support systems (GDSS), computer-supported cooperative work (CSCW), group support systems (GSS), collaboration support systems (CSS), and electronic meeting systems (EMS). GDSS have focused on decision making/ solving problems, while CSCW provide primarily a means to communicate more efficiently.

However, these two types of systems, decision making focused systems and communication-focused systems, are becoming indistinguishable. There seems to be a consensus that GSS are a broad umbrella term referring to the collective of computer assisted technologies used to aid group efforts directed at identifying and addressing problems, opportunities, and the issues (Jessup and Valacich 1993).

7.10 Organisational Decision Support SystemsAn organisational decision support system is defined as ‘a DSS that is used by individuals or groups at several work stations in more than one organisational unit who make varied (interrelated but autonomous) decisions using a common set of tools’ (Carter et al. 1992: 19). According to the same source, an important goal of organisational DSS is to provide ‘the glue that holds a large organisation together and keeps its parts marching to the beat of the same drummer toward common goals’. The two key factors to achieving these outcomes are:

Transmittal of consistent, timely information up and down the organisational hierarchy in forms those are • appropriate to each decision maker.

A set of decision-aiding models that use this information and that are appropriate for the decisions being made • by each decision maker.

7.11 Decision Support System DesignDSS design is the process of identifying the key decisions through decision analysis, specifying requirements of each DSS component to support key decisions identified through decision analysis. DSS are designed and implemented to support organisational as well as individual decision making. Without a detailed understanding of decision-making behaviour in organisations ‘decision support is close to meaningless as a concept’. Organisational scientists classify organisational decision making in terms of several schools of thought:

The rational model which focuses on the selection of the most efficient alternatives, with the assumption of a • rational, completely informed single decision maker

The organisational process model which stresses the compartmentalisation of the various units in any • organisation

The satisfying model which reflects ‘bounded rationality’ to find an acceptable, good enough solution•

And other models•

7.12 Decision Support System ImplementationUse of some computer-based information systems such as TPS and MIS are, in most cases, mandatory. But decision support systems are voluntary systems. In regard to voluntary systems, DSS implementation research has been important for ascertaining the influence of success factors of DSS implementations. DSS implementation researchers are investigating the relationship between user-related factors and implementation success. User factors include cognitive style (the characteristic ways individuals process and utilise information to solve problems), personality (the cognitive structures maintained by individuals to facilitate adjustment to events and situations), demographics (age, sex and education), and user-situation variables (training, experiences and user involvement) (Alavi and Joachimsthaler 1992).

151

Future implementation research should be directed toward the development of causal models of user-related implementation factors. Furthermore, it is suggested that DSS researchers shift the research focus from user-related factors to the contextual variables. An important assumption on which the DSS implementation research is based is that DSS are voluntary systems.

A recent survey of DSS suggests that an increasing number of DSS have become a strategic tool for organisational survival (Eom et al 1998). Thus, these systems are no longer voluntary ones. Future DSS implementation research must take this changing nature of DSS from voluntary systems to mandatory survival tools. Consequently, individual differences, cognitive styles, personality, demographics, and user-situational variables may become less critical success factors. Shifting the focus of implementation research from user-related factors to task-related, organisational, and external environmental factors may be necessary to reflect the changing decision environment in which organisation must survive and prosper.

7.13 Types of DecisionsA simple view of decision making is that it is a problem of choice among several alternatives. A somewhat more sophisticated view includes the process of constructing the alternatives that is given a problem statement, developing a list of choice options. A complete picture includes a search for opportunities for decisions that is discovering that there is a decision to be made. A manager of a company may face a choice in which the options are clear for example, the choice of a supplier from among all existing suppliers. Manager may also face a well-defined problem for which he designs creative decision options for example, how to market a new product so that the profits are maximised.

Finally, he may work in a less reactive fashion and view decision problems as opportunities that have to be discovered by studying the operations of his company and its surrounding environment for example how he can make the production process more efficient. There is much anecdotal and some empirical evidence that structuring decision problems and identifying creative decision alternatives determine the ultimate quality of decisions. Decision support systems aim mainly at this broadest type of decision making, and in addition to supporting choice, they aid in modelling and analyzing systems (such as complex organisations), identifying decision opportunities, and structuring decision problems.

7.14 Human Judgment and Decision MakingTheoretical studies on rational decision making, notably that in the context of probability theory and decision theory, have been accompanied by empirical research on whether human behaviour complies with the theory. It has been rather convincingly demonstrated in numerous empirical studies that human judgment and decision making is based on intuitive strategies as opposed to theoretically sound reasoning rules. These intuitive strategies, referred to as judgmental heuristics in the context of decision making, help us in reducing the cognitive load, but alas at the expense of optimal decision making. Effectively, our unaided judgment and choice exhibit systematic violations of probability axioms (referred to as biases). Formal discussion of the most important research results along with experimental data can be found in an anthology edited by Kahneman, Slovic, and Tversky.

Dawes provides an accessible introduction to what is known about people’s decision-making performance. One might hope that people who have achieved expertise in a domain will not be subject to judgmental biases and will approach optimality in decision making. While empirical evidence shows that experts indeed are more accurate than novices within their area of expertise, it also shows that they also are liable to the same judgmental biases as novices and demonstrate apparent errors and inconsistencies in their judgment. Professionals such as practicing physicians use essentially the same judgmental heuristics and are prone to the same biases, although the degree of departure from the normatively prescribed judgment seems to decrease with experience.


152

7.15 Modelling DecisionsThe superiority of even simple linear models over human intuitive judgment suggests that one way to improve the quality of decisions is to decompose a decision problem into simpler components that are well defined and well understood. Studying a complex system built out of such components can be subsequently aided by a formal, theoretically sound technique. The process of decomposing and formalising a problem is often called modelling. Modelling amounts to finding an abstract representation of a real-world system that simplifies and assumes as much as possible about the system, and while retaining the system’s essential relationships, omits unnecessary detail.

Building a model of a decision problem, as opposed to reasoning about a problem in a holistic way, allows for applying scientific knowledge that can be transferred across problems and often across domains. It allows for analyzing, explaining, and arguing about a decision problem. The desire to improve human decision making provided motivation for the development of a variety of modelling tools in disciplines of economics, operations research, decision theory, decision analysis, and statistics. In each of these modelling tools, knowledge about a system is represented by means of algebraic, logical, or statistical variables.

Interactions among these variables are expressed by equations or logical rules, possibly enhanced with an explicit representation of uncertainty. When the functional form of an interaction is unknown, it is sometimes described in purely probabilistic terms; for example, by a conditional probability distribution. Once a model has been formulated, a variety of mathematical methods can be used to analyze it.

Decision making under certainty has been addressed by economic and operations research methods, such as cash flow analysis, break even analysis, scenario analysis, mathematical programming, inventory techniques, and a variety of optimisation algorithms for scheduling and logistics. Decision making under uncertainty enhances the above methods with statistical approaches, such as reliability analysis, simulation, and statistical decision making.

7.16 Decision Support SystemsDecision support systems are interactive, computer-based systems that aid users in judgment and choice activities. They provide data storage and retrieval but enhance the traditional information access and retrieval functions with support for model building and model-based reasoning. They support framing, modelling, and problem solving. Typical application areas of DSSs are management and planning in business, health care, the military, and any area in which management will encounter complex decision situations. Decision support systems are typically used for strategic and tactical decisions faced by upper-level management / decisions with a reasonably low frequency and high potential consequences in which the time taken for thinking through and modelling the problem pays generously in the long run.

There are three fundamental components of DSSs:7.16.1 Database Management System (DBMS)A DBMS serves as a data bank for the DSS. It stores large quantities of data that are relevant to the class of problems for which the DSS has been designed and provides logical data structures (as opposed to the physical data structures) with which the users interact. A DBMS separates the users from the physical aspects of the database structure and processing. It should also be capable of informing the user of the types of data that are available and how to gain access to them.

7.16.2 Model-Base Management System (MBMS)The role of MBMS is analogous to that of a DBMS. Its primary function is providing independence between specific models that are used in a DSS from the applications that use them. The purpose of an MBMS is to transform data from the DBMS into information that is useful in decision making. Since many problems that the user of a DSS will cope with may be unstructured, the MBMS should also be capable of assisting the user in model building.

153

7.16.3 Dialog Generation and Management System (DGMS)The main product of an interaction with a DSS is insight. As their users are often managers who are not computer-trained, DSSs need to be equipped with intuitive and easy-to-use interfaces. These interfaces aid in model building, but also in interaction with the model, such as gaining insight and recommendations from it. The primary responsibility of a DGMS is to enhance the ability of the system user to utilise and benefit from the DSS. In the remainder of this article, we will use the broader term user interface rather than DGMS.

While a variety of DSSs exists, the above three components can be found in many DSS architectures and play a prominent role in their structure. Interaction among them is illustrated in figure below. Essentially, the user interacts with the DSS through the DGMS. This communicates with the DBMS and MBMS, which screen the user and the user interface from the physical details of the model base and database implementation.

Fig. 7.3 The architecture of a DSSs(Source: http://www.pitt.edu/~druzdzel/psfiles/dss.pdf)

7.17 Normative and Descriptive ApproachesWhether or not one trusts the quality of human intuitive reasoning strategies has a profound impact on one’s view of the philosophical and technical foundations of DSSs. There are two distinct approaches to supporting decision making. The first aims at building support procedures or systems that imitate human experts. The most prominent member of this class of DSS’s are expert systems, computer programs based on rules elicited from human domain experts that imitate reasoning of a human expert in a given domain.

Expert systems are often capable of supporting decision making in that domain at a level comparable to human experts. While they are flexible and often able to address complex decision problems, they are based on intuitive human reasoning and lack soundness and formal guarantees with respect to the theoretical reliability of their results. The danger of the expert system approach, increasingly appreciated by DSS builders, is that along with imitating human thinking and its efficient heuristic principles, we may also imitate its undesirable flaws.

The second approach is based on the assumption that the most reliable method of dealing with complex decisions is through a small set of normatively sound principles of how decisions should be made. While heuristic methods and ad hoc reasoning schemes that imitate human cognition may in many domains perform well, most decision makers will be reluctant to rely on them whenever the cost of making an error is high. To give an extreme example, few people would choose to fly airplanes built using heuristic principles over airplanes built using the laws of aerodynamics enhanced with probabilistic reliability analysis.

Application of formal methods in DSSs makes these systems philosophically distinct from those based on ad hoc heuristic artificial intelligence methods, such as rule-based systems. The goal of a DSS, according to this view, is to support unaided human intuition, just as the goal of using a calculator is to aid human’s limited capacity for mental arithmetic.


154

7.18 Decision-Analytic Decision Support SystemsAn emergent class of DSSs known as decision-analytic DSSs applies the principles of decision theory, probability theory, and decision analysis to their decision models. Decision theory is an axiomatic theory of decision making that is built on a small set of axioms of rational decision making. It expresses uncertainty in terms of probabilities and preferences in terms of utilities. These are combined using the operation of mathematical expectation.

The attractiveness of probability theory, as formalism for handling uncertainty in DSSs, lies in its soundness and its guarantees concerning long-term performance. Probability theory is often viewed as the gold standard for rationality in reasoning under uncertainty. Following its axioms offers protection from some elementary inconsistencies. Their violation, on the other hand, can be demonstrated to lead to sure losses. Decision analysis is the art and science of applying decision theory to real-world problems. It includes a wealth of techniques for model construction, such as methods for elicitation of model structure and probability distributions that allow minimisation of human bias, methods for checking the sensitivity of a model to imprecision in the data, computing the value of obtaining additional information, and presentation of results.

These methods have been under continuous scrutiny by psychologists working in the domain of behavioural decision theory and have proven to cope reasonably well with the dangers related to human judgmental biases. Normative systems are usually based on graphical probabilistic models, which are representations of the joint probability distribution over a model’s variables in terms of directed graphs. Directed graphs, such as the graph shown in figure below, are known as Bayesian networks (BNs) or causal networks.

Bayesian networks offer a compact representation of joint probability distributions and are capable of practical representation of large models, consisting of tens or hundreds of variables. Bayesian networks can be easily extended with decision and value variables for modelling decision problems. The former denote variables that are under the decision maker’s control and can be directly manipulated, and the latter encode users’ preferences over various outcomes of the decision process.

Such amended graphs are known as influence diagrams. Both the structure and the numerical probability distributions in a BN can be elicited from a human expert and are a reflection of the expert’s subjective view of a real-world system. If available, scientific knowledge about the system, both in terms of the structure and frequency data, can be easily incorporated in the model. Once a model has been created, it is optimised using formal decision-theoretic algorithms. Decision analysis is based on the empirically tested paradigm that people are able to reliably store and retrieve their personal beliefs about uncertainty and preferences for different outcomes, but are much less reliable in aggregating these fragments into a global inference.

While human experts are excellent in structuring a problem, determining the components that are relevant to it and providing local estimates of probabilities and preferences, they are not reliable in combining many simple factors into an optimal decision. The role of a decision-analytic DSS is to support them in their weaknesses using the formal and theoretically sound principles of statistics.

The approach taken by decision analysis is compatible with that of DSSs. The goal of decision analysis is to provide insight into a decision. This insight, consisting of the analysis of all relevant factors, their uncertainty, and the critical nature of some assumptions, is even more important than the actual recommendation. Decision-analytic DSSs have been successfully applied to practical systems in medicine, business, and engineering. As these systems tend to naturally evolve into three not necessarily distinct classes, it may be interesting to compare their structure and architectural organisation.

155

Fig. 7.4 Example of a Bayesian network modelling teaching expenditures in university operation(Source: http://www.pitt.edu/~druzdzel/psfiles/dss.pdf)

7.19 Systems with Static Domain ModelsIn this class of systems, a probabilistic domain is represented by a large network encoding the domain’s structure and its numerical parameters. The network comprising the domain model is normally built by decision analysts and domain experts. An example might be a medical diagnostic system covering a certain class of disorders. Queries in such a system are answered by assigning values to those nodes of the network that constitute the observations for a particular case and propagating the impact of the observation through the network in order to find the probability distribution of some selected nodes of interest for example, nodes that represent diseases. Such a network can, on a case-by-case basis, be extended with decision nodes and value nodes to support decisions. Systems with static domain models are conceptually similar to rule-based expert systems covering an area of expertise.

7.19.1 Systems with Customised Decision ModelsThe main idea behind this approach is automatic generation of a graphical decision model on a per-case basis in an interactive effort between the DSS and the decision maker. The DSS has domain expertise in a certain area and plays the role of a decision analyst. During this interaction, the program creates a customised influence diagram, which is later used for generating advice. The main motivation for this approach is the premise that every decision is unique and needs to be looked at individually; an influence diagram needs to be tailored to individual needs.


156

7.20 Systems Capable of Learning a Model from DataThe third class of systems employs computer-intensive statistical methods for learning models from data. Whenever there are sufficient data available, the systems can literally learn a graphical model from these data. This model can be subsequently used to support decisions within the same domain. The first two approaches are suited for slightly different applications. The customised model generation approach is an attempt to automate the most laborious part of decision making, structuring a problem, so far done with significant assistance from trained decision analysts.

A session with the program that assists the decision maker in building an influence diagram is laborious. This makes the customised model generation approach particularly suitable for decision problems that are infrequent and serious enough to be treated individually. Because in the static domain model approach an existing domain model needs to be customised by the case data only, the decision-making cycle is rather short. This makes it particularly suitable for those decisions that are highly repetitive and need to be made under time constraints.

A practical system can combine the three approaches. A static domain model can be slightly customised for a case that needs individual treatment. Once completed, a customised model can be blended into the large static model. Learning systems can support both the static and the customised model approach. On the other hand, the learning process can be greatly enhanced by prior knowledge from domain experts or by a prior model.

7.21 Equation-Based and Mixed SystemsIn many business and engineering problems, interactions among model variables can be described by equations which, when solved simultaneously, can be used to predict the effect of decisions on the system, and hence support decision making. One special type of simultaneous equation model is known as the structural equation model (SEM), which has been a popular method of representing systems in econometrics. An equation is structural if it describes unique, independent causal mechanism acting in the system. Structural equations are based on expert knowledge of the system combined with theoretical considerations. Structural equations allow for a natural, modular description of a system/each equation represents its individual component, a separable and independent mechanism acting in the system yet the main advantage of having a structural model is, as explicated by Simon, that it includes causal information and aids predictions of the effects of external interventions.

In addition, the causal structure of a structural equation model can be represented graphically, which allows for combining them with decision-analytic graphical models in practical systems. Structural equation models offer significant advantages for policy making. Often a decision maker confronted with a complex system needs to decide not only the values of policy variables but also which variables should be manipulated. A change in the set of policy variables has a profound impact on the structure of the problem and on how their values will propagate through the system. The user determines which variables are policy variables and which are determined within the model. A change in the SEMs or the set of policy variables can be rejected by a rapid restructuring of the model and predictions involving this new structure.

Our long-term project, the Environment for Strategic Planning (ESP), is based on a hybrid graphical modelling tool that combines SEMs with decision-analytic principles. ESP is capable of representing both discrete and continuous variables involved in deterministic and probabilistic relationships. The powerful features of SEMs allow ESP to act as a graphical spreadsheet integrating numerical and symbolic methods and allowing the independent variables to be selected at will without having to reformulate the model each time. This provides an immense flexibility that is not afforded by ordinary spreadsheets in evaluating alternate policy options.

7.22 User Interfaces to Decision Support SystemsWhile the quality and reliability of modelling tools and the internal architectures of DSSs are important, the most crucial aspect of DSSs is, by far, their user interface. Systems with user interfaces that are cumbersome or unclear or that require unusual skills are rarely useful and accepted in practice. The most important result of a session with a DSS is insight into the decision problem.

157

In addition, when the system is based on normative principles, it can play a tutoring role; one might hope that users will learn the domain model and how to reason with it over time, and improve their own thinking. A good user interface to DSSs should support model construction and model analysis, reasoning about the problem structure in addition to numerical calculations and both choice and optimisation of decision variables. These are discussed in detail below.

7.22.1 Support for Model Construction and Model AnalysisUser interface is the vehicle for both model construction or model choice and for investigating the results. Even if a system is based on a theoretically sound reasoning scheme, its recommendations will be as good as the model they are based on. Furthermore, even if the model is a very good approximation of reality and its recommendations are correct, they will not be followed if they are not understood. Without understanding, the users may accept or reject a system’s advice for the wrong reasons and the combined decision-making performance may deteriorate even below unaided performance. A good user interface should make the model on which the system’s reasoning is based transparent to the user.

Modelling is rarely a one-shot process, and good models are usually refined and enhanced as their users gather practical experiences with the system recommendations. It is important to strike a careful balance between precision and modelling efforts; some parts of a model need to be very precise while others do not. A good user interface should include tools for examining the model and identifying its most sensitive parts, which can be subsequently elaborated on. Systems employed in practice will need their models refined, and a good user interface should make it easy to access, examine, and refine its models.

7.22.2 Support for Reasoning about the Problem Structure in Addition to Numerical CalculationsWhile numerical calculations are important in decision support, reasoning about the problem structure is even more important. Often when the system and its model are complex it is insightful for the decision maker to realise how the system variables are interrelated. This is helpful in designing creative decision options but also in understanding how a policy decision will impact the objective. Graphical models, such as those used in decision analysis or in equation-based and hybrid systems, are particularly suitable for reasoning about structure.

Under certain assumptions, a directed graphical model can be given a causal interpretation. This is especially convenient in situations where the DSS autonomically suggests decision options; given a causal interpretation of its model, it is capable of predicting effects of interventions. A causal graph facilitates building an effective user interface. The system can refer to causal interactions during its dialogue with the user, which is known to enhance user insight.

7.22.3 Support for Both Choice and Optimisation of Decision VariablesMany DSSs have an inflexible structure in the sense that the variables that will be manipulated are determined at the model-building stage. This is not very suitable for planning of the strategic type when the object of the decision-making process is identifying both the objectives and the methods of achieving them. For example, changing policy variables in a spreadsheet-based model often requires that the entire spreadsheet be rebuilt. If there is no support for that, few users will consider it as an option. This closes the world of possibilities for flexible reframing of a decision problem in the exploratory process of searching for opportunities. Support for both choice and optimisation of decision variables should be an inherent part of DSSs.

7.23 Graphical InterfaceInsight into a model can be increased greatly at the user interface level by a diagram representing the interactions among its components; for example, a drawing of a graph on which a model is based, such as in Fig. 7.2. This graph is a qualitative, structural explanation of how information flows from the independent variables to the dependent variables of interest. As models may become very large, it is convenient to structure them into sub models, groups of variables that form a subsystem of the modelled system. Such sub models can be again shown graphically with interactions among them, increasing simplicity and clarity of the interface. Fig. 7.3 shows a sub model-level view of a model developed in our ESP project.


158

SummaryMany variables are involved in complex and often subtle interdependencies and predicting the total outcome • may be daunting.

Decision support systems are gaining an increased popularity in various domains, including business, engineering, • the military, and medicine.

Since the first electronic general-purpose computer was put into full operation in the early 1940s, data-processing • techniques have been continuously advancing.

MIS were developed to extract valuable management information by aggregating and summarising massive • amounts of transaction data and allowing user-interactive managerial queries.

The inclusion of simple modelling and statistical methods as a component of MIS permits computer systems • to make routine (structured) decisions.

Neural network computing involves building intelligent systems to mimic human brain functions.•

Decision support systems (DSS) are human–computer decision-making systems to support managerial • judgements, and intuitions to solve managerial problems by providing necessary information, generating, evaluating and suggesting decision alternatives.

DSS consists of two major sub-systems human decision makers and computer systems.•

An unstructured (or semi-structured) decision by definition cannot be programmed because its precise nature • and structure are elusive and complex.

The DBMS can be either an independent program or embedded within a DSS generator to allow users to create • a database file that is to be used as an input to the DSS.

MBMS is a set of computer programs embedded within a DSS generator that allows users to create, edit, update, • and/or delete a model.

DSS is distinguished from MIS in terms of focusing on effectiveness, rather than efficiency in decision • processes.

An important performance objective of DSS is to support all phases of the decision-making process.•

Data are facts which result from the observation of physical phenomena such as daily production quantity, daily • sales quantity and inventory level of product.

Research in user interface sub-systems has investigated several important issues in the designing, building, and • implementing of a user interface.

ESS is to replace human expertise with machine expertise, while ISS are to amplify the memory and intelligence • of humans and groups.

To support a set of decision makers working together as a group, group DSS have special technological • requirements of hardware, software, people and procedures.

159

References Turban, 2008. • Decision Support And Business Intelligence Systems, 8/E, Pearson Education India Publication.

Sauter, V., 2010. • Decision Support Systems for Business Intelligence, 2nd ed. John Wiley & Sons Publication.

2010. • Decision Support Systems - An Introduction, [Video Online] Available at: < http://www.youtube.com/watch?v=la7MdnrlLZc> [Accessed 3 May 2012].

2010. • Decision Support Systems, [Video Online] Available at: <http://www.youtube.com/watch?v=aWc4H8cfGE8> [Accessed 3 May 2012].

Decision support systems• , [Online] Available at: <http://cstl-hcb.semo.edu/eom/iebmdssrwweb.pdf > [Accessed 3 May 2012].

Flynn, R. & Druzdzel, M., • Decision Support Systems, [Online] Available at: <http://www.pitt.edu/~druzdzel/psfiles/dss.pdf > [Accessed 3 May 2012].

Recommended ReadingThierauf, R., 2001. • Effective Business Intelligence Systems, Greenwood Publishing Group Publication.

Vercellis, C., 2011. • Business Intelligence: Data Mining and Optimisation for Decision Making, John Wiley & Sons Publication.

Turban, E., Sharda, R. & Delen, D., • Decision Support and Business Intelligence Systems, Pearson Education India Publication.


160

Self AssessmentWhich of the following statements is true?1.

DSS is same MIS.a. DSS supports only few phases of the decision-making process.b. A database is a collection of interrelated files.c. Data management in DSS is not a necessary function.d.

_____is to replace human expertise with machine expertise.2. ISSa. DSSb. KBDSSc. ESSd.

________ is a hybrid system of DSS and ES.3. ISSa. DSSb. KBDSSc. ESSd.

A _______serves as a data bank for the DSS.4. DBMSa. Oracleb. SQLc. Memoryd.

The purpose of a _______is to transform data from the DBMS into information that is useful in decision 5. making.

DGMSa. MBMSb. DSSc. OLAPd.

______are to deal with those organisational problems that can be better solved by qualitative data processing.6. ISSa. DSSb. KBDSSc. EISd.

______are a set of computer programs that create and manage the database, as well as control access to the 7. data stored within it.

DBMSa. MBMSb. KBDSSc. Oracled.

161

_______is a set of computer programs embedded within a DSS generator that allows users to create, edit, 8. update, and/or delete a model.

DBMSa. MBMSb. DGMSc. DSSd.

_______systems permit users to retrieve, manipulate and display current and historical data.9. DBMSa. Data storageb. Data analysisc. OLAPd.

A ________is a collection of interrelated files.10. databasea. DBMSb. RDBMSc. ROLAPd.


162

Chapter VIII

Types of Business Intelligence Tools

Aim


elucidate various types of business intelligence tools•

explicate decision engineering framework•

explain key general categories of business intelligence tools•

Objectives


explain data mining and data warehousing •

explicate operational case for data warehousing•

elucidate increasing data storage•

Learning outcome


understand strategies for increased profitability•

describe analytics for strategic business information implementation•

identify the tools for business intelligence•

163

8.1 IntroductionWhether a company subscribes to a specific management methodology or not, the benefits of monitoring and communicating performance results cannot be argued against. By clearly stating goals and translating them into measurable objectives, organisations can quantify and assess their progress. Without this, the status of the business can go unchecked for extended periods of time, which can be particularly detrimental if performance results are negative. The purpose of performance management is two-fold. First, it enables senior executives to collaborate on and agree to a corporate strategy. Second, it facilitates the sharing of those goals with middle management and frontline workers, so everyone is properly aligned and striving to reach the same objectives.

In Robert Kaplan and David Norton’s book The Balanced Scorecard, strategy maps and leading and lagging indicators are discussed. This concept is essential to truly effective performance management, even for companies that don’t follow the balanced scorecard approach. Kaplan and Norton claim that the overall goal of a company, or even one of its individual business units, can be described as a single end result. For example, the primary objective may be higher profitability, but having everyone in the company monitor only profit levels would serve no purpose in helping to achieve that goal. Why? Because profit is a lagging indicator, one that relies on other factors like increased sales or reduced overhead costs in order to be reached. These other factors are referred to as leading indicators.

Fig. 8.1 Reasons for increased profitability(Source: http://www.ebsolute.com/web/doc/InformationManagementWP_BI.pdf)

But, there are many levels to leading indicators. Let’s take a look at reduced overhead costs. Expenses associated with travel, shipping, manufacturing, human resources, and other business functions all contribute to overhead. And increased sales can be impacted by sales to new customers or sales to existing customers.

In this section we’ll focus on sales to existing customers, because its common knowledge that it is far more profitable to generate repeat business than it is to solicit new clients. Many experts claim that with skyrocketing advertising and marketing expenses, the cost of acquiring a new customer can be as much as ten times more than the cost of selling additional products and services to current accounts. So, a company striving to improve profitability may seek to:

Increase sales to existing customers•

Maintain revenue levels from new clients•

Reduce travel expenses by booking flights only on discount airlines•

Each of these initiatives will be represented on the strategy map, and their importance will be communicated to line of business workers, such as sales reps, or the administrative staff that is responsible for making travel arrangements. The strategy map may look something like this:


164

Fig. 8.2 The strategy map for increased profitability(Source: http://www.ebsolute.com/web/doc/InformationManagementWP_BI.pdf)

The initiatives that link up to the overall strategy can drill further down into more specific and detailed operational processes. For example, in order to sell more to existing customers, a company would need to boost satisfaction levels. In order to improve customer satisfaction, service and support staff, particularly agents in the contact center, will need to be more courteous and responsive.

165

Fig. 8.3 Improved strategies for increased sales(Source: http://www.ebsolute.com/web/doc/InformationManagementWP_BI.pdf)

8.2 The Key General Categories of Business Intelligence ToolsThe key general categories of business intelligence tools are as following.

Spreadsheets•

Reporting and querying software•

OLAP•

Digital dashboards•

Data mining•

Data warehousing•

Decision engineering•

Process mining•

Business performance management•

Local information systems•


166

All the above terms are explained below:8.2.1 SpreadsheetsA spreadsheet is a computer application with tools that increase the user’s productivity in capturing, analyzing, and sharing tabular data sets. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns (in other words, a table, hence “tabular”). Each cell contains alphanumeric text, numeric values, or formulas. A formula defines how the content of that cell is to be calculated from the contents of any other cell (or combination of cells) each time any cell is updated. A pseudo third dimension to the matrix is sometimes applied as another layer, or layers/sheets, of two-dimensional data.

Spreadsheets developed as computerised simulations of paper accounting worksheets. They boost productivity because of their ability to re-calculate the entire sheet automatically after a change to a single cell is made (which was a manual process in the days of paper ledgers). Spreadsheets have now replaced paper-based systems throughout the business world, with any exceptions being rare, because of the much greater productivity that they make possible, and thus the competitive disadvantage of spreadsheet illiteracy. Although they were first developed for accounting or bookkeeping tasks, they now are used extensively in any context where tabular lists are built, sorted, and shared.

A modern spreadsheet file consists of multiple worksheets (usually called by the shorter name sheets) that make up one workbook, with each file being one workbook. A cell on one sheet is capable of referencing cells on other, different sheets, whether within the same workbook or even, in some cases, in different workbooks.

Spreadsheets share many principles and traits of databases, but spreadsheets and databases are not the same thing. A spreadsheet is essentially just one table, whereas a database is a collection of many tables with machine-readable semantic relationships between them. Spreadsheets are often imported into databases to become tables within them. While it is true that a workbook that contains three sheets is indeed a file containing multiple tables that can interact with each other, it lacks the relational structure of a database.

A spreadsheet app is one of four or five main component apps of an office productivity suite such as OpenOffice or Microsoft Office [MSO]. Such suites group a spreadsheet app such as OpenOffice Calc or Microsoft Office Excel with a word processor, a presentation program, and a database management system and, optionally, various other apps into a solution stack that aids the productivity of most office work, from administrative to managerial.

Visicalc was the first electronic spreadsheet on a microcomputer, and it helped turn the Apple II computer into a success and greatly assisted in their widespread application. Lotus 1-2-3 was the leading spreadsheet when DOS was the dominant operating system. Excel now has the largest market share on the Windows and Macintosh platforms. Since the advent of web apps, office suites now also exist in web-app form, with Google Docs and Microsoft Office Web Apps being the biggest competitors in the segment, and thus Google spreadsheets now share the market with Excel. As cloud computing gradually replaces desktop computing, spreadsheet apps continue to be important components for typical end users.

167

Fig. 8.4 An excel spreadsheet

8.2.2 Reporting and Querying SoftwareThe following is a list of notable reporting software. Reporting software is used to generate human-readable reports from various data sources.

Open source software: Eclipse, GNU Enterprise, JasperReports, Pentaho, these are the Open Source softwares which BI uses.

Commercial software: ActiveReports, Actuate, Business Objects, Congos BI, Crystal Reports etcetera are the commercial softwares which are used by BI.

8.2.3 OLAPOLAP means many different things to different people, but the definitions usually involve the terms “cubes”, “multidimensional”, “slicing & dicing” and “speedy-response”. OLAP is all of these things and more, but it is also a misused & misunderstood term, in part because it covers such a broad range of subjects. We will discuss the above terms in later sections; to begin with, we explain the definition & origin of OLAP.

OLAP is an acronym, standing for “On-Line Analytical Processing”. This, in itself, does not provide a very accurate description of OLAP, but it does distinguish it from OLTP or “On-Line Transactional Processing”. The term OLTP covers, as its name suggests, applications that work with transactional or “atomic” data, the individual records contained within a database. OLTP applications usually just retrieve groups of records and present them to the end-user, for example, the list of computer software sold at a particular store during one day. These applications typically use relational databases, with a fact or data table containing individual transactions linked to Meta tables that store data about customers & product details.


168

OLAP applications present the end user with information rather than just data. They make it easy for users to identify patterns or trends in the data very quickly, without the need for them to search through mountains of “raw” data. Typically this analysis is driven by the need to answer business questions such as “How are our sales doing this month in North America?” From these foundations, OLAP applications move into areas such as forecasting and data mining, allowing users to answer questions such as “What are our predicted costs for next year?” and “Show me our most successful salesman”.

OLAP applications differ from OLTP applications in the way that they store data, the way that they analyze data and the way that they present data to the end-user. It is these fundamental differences that allow OLAP applications to answer more sophisticated business questions.

Why do we need OLAP?When first investigating OLAP, it is easy to question the need for it. If an end user requires high-level information about their company, then that information can always be derived from the underlying transactional data, hence we can achieve every requirement with an OLTP application. Were this true, OLAP would not have become the important topic that it is today. OLAP exists & continues to expand in usage because there are limitations with the OLTP approach. The limits of OLTP applications are seen in three areas.

8.2.4 Increasing Data StorageThe trend towards companies storing more & more data about their business shows no sign of stopping. Retrieving many thousands of records for immediate analysis is a time and resource consuming process, particularly when many users are using an application at the same time. Database engines that can quickly retrieve a few thousand records for half-a-dozen users struggle when forced to return the results of large queries to a thousand concurrent users.

Caching frequently requested data in temporary tables & data stores can relieve some of the symptoms, but only goes part of the way to solving the problem, particularly if each user requires a slightly different set of data. In a modern data warehouse where the required data might be spread across multiple tables, the complexity of the query may also cause time delays & require more system resources which means more money must be spent on database servers in order to keep up with user demands.

8.2.5 DashboardThere are many different ideas of what a dashboard is. We will clearly define it here along with other presentation tools. There are typically four types of presentation media: dashboards, visual analysis tools, scorecards, and reports. These are all visual representations of data that help people identify correlations, trends, outliers (anomalies), patterns, and business conditions. However, they all have their own unique attributes.

Dashboard Insight uses Stephen Few’s definition of a dashboard:A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.

169

Here are the key characteristics of a dashboard:

All the visualisations fit on a single computer screen scrolling to see more violates the definition of a • dashboard.

It shows the most important performance indicators / performance measures to be monitored.•

Interactivity such as filtering and drill-down can be used in a dashboard; however, those types of actions should • not be required to see which performance indicators are under performing.

It is not designed exclusively for executives but rather should be used by the general workforce as effective • dashboards are easy to understand and use.

The displayed data automatically updated without any assistance from the user. The frequency of the update • will vary by organisation and by purpose. The most effective dashboards have data updated at least on a daily basis.

8.2.6 Data MiningData mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, “Which clients are most likely to respond to my next promotional mailing, and why?” Examples of profitable applications in this chapter illustrate its relevance to today’s business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users.

8.2.7 The Foundations of Data MiningData mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature:

Massive data collection•

Powerful multiprocessor computers•

Data mining algorithms•


170

Commercial databases are growing at unprecedented rates. A recent META Group survey of data warehouse projects found that 19% of respondents are beyond the 50 gigabyte level, while 59% expect to be there by second quarter of 1996. In some industries, such as retail, these numbers can be much larger. The accompanying need for improved computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statistical methods.

8.2.8 Data WarehousingA data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse are uploaded from the operational systems. The data may pass through an operational data store for additional operations before they are used in the DW for reporting.

The typical ETL-based data warehouse uses staging, integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data is then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchal groups often called dimensions and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema. The access layer helps users retrieve data.

A data warehouses constructed from integrated data source systems does not require ETL, staging databases, or operational data store databases. The integrated data source systems may be considered to be a part of a distributed operational data store layer. Data federation methods or data virtualisation methods may be used to access the distributed integrated source data systems to consolidate and aggregate data directly into the data warehouse database tables.

Unlike the ETL-based data warehouse, the integrated source data systems and the data warehouse are all integrated since there is no transformation of dimensional or reference data. This integrated data warehouse architecture supports the drill down from the aggregate data of the data warehouse to the transactional data of the integrated source data systems. Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse.

This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.

8.2.9 Decision EngineeringDecision Engineering is a framework that unifies a number of best practices for organisational decision making. It is based on the recognition that, in many organisations, decision making could be improved if a more structured approach were used. Decision engineering seeks to overcome a decision making “complexity ceiling”, which is characterised by a mismatch between the sophistication of organisational decision making practices and the complexity of situations in which those decisions must be made.

As such, it seeks to solve some of the issues identified around complexity theory and organisations. In this sense, Decision Engineering represents a practical application of the field of complex systems, which helps organisations to navigate the complex systems in which they find themselves. Despite the availability of advanced process, technical and organisational decision making tools, decision engineering proponents believe that many organisations continue to make poor decisions. In response, decision engineering seeks to unify a number of decision making best practices, creating a shared discipline and language for decision making that crosses multiple industries, both public and private organisations, and that is used worldwide.

171

To accomplish this ambitious goal, decision engineering applies an engineering approach, building on the insight that it is possible to design the decision itself using many principles previously used for designing more tangible objects like bridges and buildings. This insight was previously applied to the engineering of software another kind of intangible engineered artefact with significant benefits.

As in previous engineering disciplines, the use of a visual design language representing decisions is emerging as an important element of decision engineering, since it provides an intuitive common language readily understood by all decision participants. Furthermore, a visual metaphor improves the ability to reason about complex systems as well as to enhance collaboration. In addition to visual decision design, there are two other aspects of engineering disciplines that aid mass adoption. These are:

the creation of a shared language of design elements and•

the use of a common methodology or process, as illustrated in the diagram above•

Fig. 8.5 Decision engineering framework(Source: http://www.saylor.org/site/wp-content/uploads/2011/06/Decision-Engineering-.pdf)

8.2.10 Process MiningProcess mining provides an important bridge between data mining and business process modelling and analysis. Under the Business Intelligence (BI) umbrella many buzzwords have been introduced to refer to rather simple reporting and dashboard tools. Business Activity Monitoring (BAM) refers to technologies enabling the real-time monitoring of business processes.


172

Complex Event Processing (CEP) refers to technologies to process large amounts of events, utilising them to monitor, steer and optimise the business in real time. Corporate Performance Management (CPM) is another buzzword for measuring the performance of a process or organisation. Also related are management approaches such as Continuous Process Improvement (CPI), Business Process Improvement (BPI), Total Quality Management (TQM), and Six Sigma. These approaches have in common that processes are “put under a microscope” to see whether further improvements are possible. Process mining is an enabling technology for CPM, BPI, TQM, Six Sigma, and the like. Whereas BI tools and management approaches such as Six Sigma and TQM aim to improve operational performance, for example, reducing flow time and defects, organisations are also putting more emphasis on corporate governance, risks, and compliance.

Legislations such as the Sarbanes-Oxley Act (SOX) and the Basel II Accord illustrate the focus on compliance issues. Process mining techniques offer a means to more rigorously check compliance and ascertain the validity and reliability of information about an organisation’s core processes. Over the last decade, event data have become readily available and process mining techniques have matured. Moreover, as just mentioned, management trends related to process improvement for example Six Sigma, TQM, CPI, and CPM and compliance SOX, BAM, etc. can benefit from process mining. Fortunately, process mining algorithms have been implemented in various academic and commercial systems.

Today, there is an active group of researchers working on process mining and it has become one of the “hot topics” in Business Process Management (BPM) research. Moreover, there is a huge interest from industry in process mining. More and more software vendors are adding process mining functionality to their tools. Examples of software products with process mining capabilities are: ARIS Process Performance Manager (Software AG), Comprehend (Open Connect), Discovery Analyst (StereoLOGIC), Flow (Fourspark), Futura Reflect (Futura Process Intelligence), Interstage Automated Process Discovery (Fujitsu), OKT Process Mining suite (Exeura), Process Discovery Focus (Iontas/Verint), ProcessAnalyzer (QPR), ProM (TU/e), Rbminer/Dbminer (UPC), and Reflect| one (Pallas Athena). The growing interest in log-based process analysis motivated the establishment of a Task Force on Process Mining.

The task force was established in 2009 in the context of the Data Mining Technical Committee (DMTC) of the Computational Intelligence Society (CIS) of the Institute of Electrical and Electronic Engineers (IEEE). The current task force has members representing software vendors for example, Pallas Athena, Software AG, Futura Process Intelligence, HP, IBM, Infosys, Fluxicon, Businesscape, Iontas/Verint, Fujitsu, Fujitsu Laboratories, Business Process Mining, Stereologic, consultancy firms/ end users for example, ProcessGold, Business Process Trends, Gartner, Deloitte, Process Sphere, Siav SpA, BPM Chile, BWI Systeme GmbH, Excellentia BPM, Rabobank), and research institutes for example, TU/e, University of Padua, Universitat Politècnica de Catalunya, New Mexico State University, IST - Technical University of Lisbon, University of Calabria, Penn State University, University of Bari, Humboldt- Universität zu Berlin, Queensland University of Technology, Vienna University of Economics and Business,

8.2.11 Business Performance ManagementBusiness performance management (BPM) is a key business initiative that enables companies to align strategic and operational objectives with business activities in order to fully manage performance through better informed decision making and action. BPM is generating a high level of interest and activity in the business and IT community because it provides management with their best opportunity to meet their business measurements and achieve their business goals. IBM uses the term BPM for business initiatives that emphasise aligning strategic and operational objectives in addition to monitoring and managing business processes and associated IT events.

173

8.2.12 The BPM ImperativeIn today’s dynamic business environment, increased stakeholder value has become the primary means by which business executives are measured. The ability to improve business performance is therefore a critical requirement for organisations. Failure to improve business performance is not tolerated by stakeholders, who are quick to exercise their power. One result of this is the volatility of stock prices, which creates a tense roller coaster ride for executives. Bringing more pressure to bear is the fact that business performance measurement time frames are becoming ever shorter. Quarterly targets have replaced annual ones, and the expectation of growth and success is there at every quarter end.

To help smooth out the roller coaster ride, businesses need to react quickly to accommodate changing marketplace demands and needs. Flexibility and business agility are keys to remaining competitive, and, in some cases, viable. What is needed is a holistic approach that enables companies to align strategic and operational objectives in order to fully manage achievement of their business performance measurements.

To become more proactive and responsive, businesses need information to give them a single view of their enterprise. With this type of information, they can: Make more informed and effective decisions. Manage business operations and minimise disruptions. Align strategic objectives and priorities both vertically and horisontally throughout the business. Establish a business environment that fosters continuous innovation and improvement.

8.2.13 Local Information SystemThe local information system can be defined as ‘A system to support the administration of environment, resources and planning tasks by making information available to executive and public through co-ordinating existing systems and investments by a common architecture.

These definitions show the system is:

Not limited to one Information and Communication Technology (ICT) tool, and nor do we prescribe which • technologies should be used or which organisations should be involved.

The approach is ‘User Led’ and is designed to be applicable to any scenario in the coastal zone.•

By ‘Local’ we mean that the system it is not national or centrally organised, or simply providing information • for strategic decision makers. It is built to support the efforts of local managers such as engineers, planners, tourism officers, developers, marine industries, harbour masters, environmental managers, conservationists, wardens, archaeologists and scientists.

It could be based on the physiographic of a natural system, such as an estuary.•

We do not mean a single data management tool that is used to manage the internal data holdings of one ‘office’, • although many of the principles in these guidelines are relevant to that task.

Except for spreadsheets, these tools are sold as standalone tools, suites of tools, components of ERP systems, or as components of software targeted to a specific industry. The tools are sometimes packaged into data warehouse appliances.

8.3 Eight Strategies for Delivering Business Intelligence on the WebThese strategies will help companies ensure they are distributing the kind of high‐quality, actionable BI necessary to make real‐time business decisions. Businesses have mastered using the Web as a communication tool, and with good reason. The Internet has proven to speed connectivity between disparate organisations and enable a mobile workforce. Yet leading organisations are realising that using the Web channel to communicate business intelligence (BI) in near real‐time fashion supersedes its previous use as an information dissemination and collaboration tool.


174

Not so long ago, paper reporting was distributed on a monthly, or less‐frequent, basis. As business users became more and more technology‐savvy, reporting became more “downloadable” on demand. A step in the right direction in terms of flexibility, yet this had each user slicing and dicing data as he saw fit to find hidden treasure in the information. Most individual employees have neither a full view of corporate objectives, nor the expert knowledge or the time to dig for critical information in raw or even summary data.

When it comes to delivering BI, the goals should be accessibility and clarity—not flexibility. End users should not have to wonder how to find the information that they are looking for, and what the information they are looking at means. This focus is further heightened when new and less sophisticated delivery platforms are considered; for example, mobile devices are not equipped to handle large amounts of data manipulation, but are perfect to receive frequent updates of succinct business intelligence.

To succeed in today’s competitive, fast‐paced business environment, it is imperative that the right content is aggregated and delivered to the right people at the critical moment when a decision must be executed. The following eight strategies will help companies ensure they are distributing the kind of high‐quality, actionable BI necessary to make real‐time business decisions.

8.3.1 Pick the best delivery vehicle for your audience and your dataCore metrics that users need to develop a quick and clear understanding of the organisation’s state of health should not exceed five to eight numbers. Typically, such information is not intended to be printed and taken to a meeting, and is therefore well suited for placement on a dashboard, mobile device or in a pagelet on a portal page. Detailed reports, on the other hand, are better looked at on paper and should be provided in document format, such as PDF or Excel. Rendered documents provide much more control over the appearance of a printed report versus HTML Web pages.

8.3.2 Integrate The Presentation LayerA BI solution is often built as an add‐on component to the transactional system that generates much of the data. Users who work with the transactional system want access to reports from where they need it from within the application and are much less likely to navigate to a different web site that would invariably come with a different user experience. Since the value of the reports depends directly upon a user’s ability to locate, analyze and use the information skilfully and appropriately, we recommend that the reports become part of the application’s user interface. Furthermore, providing direct links to important reports from key locations on the interface will give you tighter control over where the reports appear and to whom.

8.3.3 Integrate the Security LayerIn order to effectively integrate the presentation layers of our report delivery application and our transactional application, it is necessary to integrate security layers as well. One way to accomplish this is a true integration with shared user accounts. However, this approach can be complicated if either side uses proprietary security. A simulated integration where the application authenticates with the reporting server through a service account is comparably simple and straight‐forward to implement. In this scenario, the application proxies all requests to the report server and streams the results back to the client. This has the added benefit that a user does not need to access the report server directly so that it can be hidden behind a firewall. A reference implementation of a Java proxy for SQL Server Reporting Services is described in.

8.3.4 Customise the Presentation for Target Devices and User RolesThere is great appeal to the idea of making critical business information available to decision makers anywhere in the world, using current data and not depending on any additional infrastructure other than a cell phone or PDA. Clarity and accessibility become even more important for mobile device access. Full‐scale BI reports are too ungainly to be delivered to the small screens of handheld devices, and cell phones don’t come with sophisticated input devices. Therefore, BI reports for wireless devices should be limited to a set of key performance indicators or a dashboard. In addition, navigation must be cut down to what is absolutely necessary.

175

8.3.5 Target Reports to UsersIf access to reports is integrated into a transactional application, parameters can be passed to the report server that reflect the user’s role selection, current navigational context, and other values that a report can use to create a targeted and customised view of the data. For instance, if the application maintains some sort of organisational tree and users can navigate from the corporate level to regional and branch office levels, the reports can show data for the currently selected node in the organisation. The user does not have to navigate down to the node again in a separate report application.

8.3.5 Use a Combined Push/Pull ModelIn order to maximise the use of a BI solution, it can be helpful to push information to the users instead of depending on their ability to pull the information when needed. If our presentation layer is already integrated with a transactional application that the users rely on a daily basis, this information push can be done on Web pages with ticklers, “advertisements” and links to reports. Alternatively, report updates can be pushed via email.

8.5.6 Keep Information TimelyThe closer to real‐time our enterprise BI data is, the more costly the implementation, especially in large scale enterprises. Keep in mind that information is only useful if it pertains to decisions that need to be made. Therefore, firms should be judicious about hat their information requirements truly are and plan accordingly.

8.5.7 Take Advantage of Enterprise Application Integration (EAI)Though we recommend delivering key performance indicators (KPIs) within the proper context and application that is not forcing the user to a separate application, calculating and delivering KPIs often requires information integrated from other applications. This form of EAI can be accomplished in many ways, from the database level on up to Web Services, or with simple feeds from the system of record. Be aware of the EAI requirements, especially when we consider outsourcing any portion of our critical operation (data).

While the Web is a very effective distribution channel for business intelligence, Web features should only be used with a specific goal in mind or they will subtract value from a solution. Even a disciplined approach will only be successful if the potential pitfalls identified above are being managed properly. BI is most effective through the merger of EAI initiatives with relevant positioning of business intelligence. The power of EAI and BI together allows a concise focus on creating business value: disseminating easily understood information to each employee’s unique information requirements, without requiring slicing and dicing on their own and in near real‐time (NRT) fashion.

8.6 Use Analytics for Strategic Business Information ImplementationStrategic business information can revolutionise business analysis and performance monitoring. Find out what we need to do to implement it in our company. When we first became aware of the business information warehouse concept, it was probably put forth as a tool of expedience which it certainly is. And then we probably came to appreciate it as a radical repositioning of the end user in the information processing food chain.

But we may not have gone so far as to classify BI as a strategic instrument of paradigm‐shifting importance. And this it can be, if we can bring our company’s management around to a new point of view. BI offers management a new strategic edge, a means of implementing business performance management the dynamic measurement and pursuit of performance goals from non traditional perspectives. The new breed of manager springing up in the corporate world touts this approach with increasing frequency. It’s up to us to provide the tools to make it a reality.

How is this achieved? It is achieved with timely, at‐their‐fingertips actionable intelligence, information that enables our management’s decision support system. The business information warehouse provides this, in the form of problem‐specific, department‐specific aggregations of data called data marts, along with a processing framework called analytics.


176

8.7 Life in the Fast LaneThe idea behind the business information warehouse is to tear down the traditional information processing bridge and build a new one. Where applications used to connect users to data, now the user is the connection between the data and the application. The idea is to create a storehouse of highly accurate and useful data that a user can access rapidly, flexibly, and easily, in order to facilitate a more responsive business environment based on evermore accurate forecasting.

Such a warehouse is obviously the culmination of decades of dreaming in the realm of business reporting. But why do we need such reports? We need them for analysis, of course, and for monitoring our business’s performance at the moment. And it is in this area that the warehouse becomes a true strategic advantage for our company. We must cultivate analytics, and we must model our system to accommodate them.

But even this is only a halfway measure. What is the analytical goal of a company that is doing performance management? It is to evaluate the performance of the company as a whole as a dynamic, producing system rather than as an aggregation of individual departments. To this end, we need to reconsider both our data warehouse and our approach to analytics. Many companies implementing a data warehouse and analytic applications that make use of it do so with no forethought to coordination of those applications. This is fine for analyzing and monitoring the performance of a department, but it doesn’t do much for the company as a whole. By all means, have department‐specific or business area‐specific data marts and local analytics, but optimise our investment by coordinating our analytics and data marts across our entire operation.

8.8 Modelling for Company Wide CoordinationWhat we need from our BI investment is a new paradigm for the use of information. We want to leverage the data our company has accumulated to optimise performance, companywide. While the department‐by‐department improvements realised by data mart/local analytic point solutions may be thought of as tactical activity, a companywide effort is, by definition, strategic: We want to use information in ways that changes the manner in which we do business. Let’s carry the use of military terms a step further and stop using the word information; instead, let’s call it intelligence.

In a military operation, intelligence is useless if it isn’t shared. If we’re a general trying to take the beach, we must coordinate the activity of amphibious troop carriers, naval support, air support, and deployed troops. Each of these individual units has information available to it that is locally useful. But that usefulness is severely curtailed if it is never shared.

Our first principle in strategic optimisation of data warehouse intelligence, then, is:Analytical intelligence gleaned from the data warehouse at the departmental level must be shared/ available throughout the company. What’s the hands‐on step that makes this happen? Turn our data marts and local analytics into a business knowledge network. There are many approaches to this, and they all hinge on the software we’ve used to implement BI and our in‐house network infrastructure. That’s detail work. Where we must put real planning is in the layer between the storage/communications technology and the data warehouse.

177

The information that should be passed between departments/business units and made available to the highest levels of management is that which meets any or all of these criteria:

Is it “active” information? That is it current and reliable and do decisions hinge upon it?•

Does it affect performance at the departmental level, or is it information that describes performance in such a • way as to affect decisions to be made at any other level or for any other department/business unit?

Does the information contribute to the performance measure or real• ‐time monitoring of the company’s high‐level performance goals?

Can the information be indicative of current or impending interruptions or degradations in either departmental • or company performance?

Does the information affect the timeliness of performance or the response by any department or the company • in general?

Once we’ve determined, department‐by‐department, what information feeds decision making in this manner, we have essentially classified the core intelligence knowledge for our company’s decision support system. We must now formalise the regular and timely generation of this information, using data marts and analytics, at the departmental level. This we will leave in the hands of the people who own the data.

Our next step is to set up a notification system that will pass intelligence from department to department, making our on‐the‐fly intelligence metrics a de facto distributed system. For example, when sales determine that there’s increased demand for a product in the marketplace, the system will passively notify production, warehousing, logistics, and senior management. These notifications will almost always be to multiple recipients, because most business activity, viewed from a high level, is defined not by departmental activity but by interactions between departments.

Our second major initiative, then, is:Make departmental performance metrics part of a distributed, companywide system by building in means of notification to other departments when data indicates an increase or decrease in performance.

8.9 Talk to the BossNext, we’ll need a set of high‐level analytics for intelligence passing from the department level to senior management. These will generate comparative data for performance metrics defined by the company’s performance as an integrated whole. In addition, we need to configure projected vs. actual metrics monitors, data displays easily understood by decision makers at high levels, to be fed by these integrated metrics.

These high‐level analytics are keys to our success. There are several important principles to keep in mind while they are being generated:

They are not defined by senior management. Rather, senior management tells us what must be measured in order • for companywide performance to be effectively optimised, and we will seek out the functional parameters at the departmental level; to this end, we need an expert from each area helping to define these high‐level analytics at every step.

No one person can tell us every factor that positively or negatively impacts company performance, yet that’s • what these analytics propose to capture; we must effectively bring together our mid‐level and low‐level experts and let them interact extensively in order to meaningfully define these analytics.

They must be extensively tested; since they are part of a system intended for highly• ‐specific forecasting, it will be important to define performance evaluation for comparative purposes at the departmental and senior management levels and to assign the appropriate parties to oversee this, for the period during which the efficacy of these high‐level analytics is being determined.


178

Our final objective, then, is:Create high‐level analytics that will present senior management (or anyone who cares to keep track) with accurate, timely metrics for evaluating overall business performance.

Of course, this is all much easier said than done. We’ll probably put in as much meeting time and haggling over details as we would for a major conversion or implementation. The good news is that this isn’t a major conversion or implementation; we’re putting in lots of people time, to be sure, but it’s time that we’ll recover in business performance and we’ll realise those performance gains rapidly. The happy ending to the story is that it’s not going to cost us millions in new software or an overhaul of old systems: We can build such a structure on top of what we already have, using tools we already possess.

We don’t need to reinvent the wheel or rebuild what we’ve already built. Everything described above is an add‐on to what we already have in place, if we’ve designed a non‐centralised, highly‐granular data warehouse (as most data warehouses should be). If we’ve already invested in the data warehouse, we’re more than halfway home. And if we haven’t considered taking that investment up to this new level, consider how we’ll appear to senior management when we offer a magic wand like this, and tell them they’ve already paid for it; it’s just a matter of putting it to use.

8.10 Making the Operational Case for Data WarehousingA data warehouse will be a bargain and a powerful strategic tool that will give our company a competitive edge but first we need to convince our organisation’s decision makers. Here’s how to make our case.

We know we need a data warehouse. There’s no doubt in our mind. Our competition is out‐forecasting us in anticipating market trends. Other companies are faster to respond than ours in an increasingly Web‐driven marketplace, and we aren’t getting the efficiency gains we need to catch up. Our various departments are pinging our production system to death with queries and report requests. We need a data warehouse. As we march down the hall to meet with our fellow decision‐makers, do we know what we’re going to say? How do we sell this idea? How do we convince them to spring for this shapeless, faceless thing that has no explicit price tag and no explicit benefit? Here are some pointers.

8.11 Return on Investment?The executive mindset of ROI‐as‐decision‐driver isn’t going to work when we present the plan for implementing a data warehouse. Why not? Because the production efficiency and the increased market savvy that a data warehouse delivers are not things we’ve ever measured. We can’t offer a projected ROI.

We can’t quantify the benefits because we’ve never experienced them in such a context. Indeed, the very reason we want a data warehouse in the first place is to develop the metrics to create a business intelligence capacity that would teach us how to explicitly measure “increased market savvy.” We’re asking for the time and money to develop an instrument that will measure itself.

We can’t even describe what we’re ultimately going to measure, once we’ve developed the analytics and performance‐monitoring capability we’re seeking. Why is this? Because our high‐level business performance metrics can be based on finely‐detailed, lower‐level metrics, and these are the province of analysts we’re going to grow in‐house, at the departmental level. Only they know what their drivers will be, and even they won’t really know what the magic numbers are until they create them. This all sounds very mystical at face value, and it is, but it’s magic that works. Here’s what we can say with confidence, and how to say it. While there are no numbers attached, it is a logical case that has no reasonable refutation.

179

8.12 Many‐for‐One, Among OthersHere’s what we get for our money, whatever the cost turns out to be:

Multilevel trend analysis. Our financial people and sales and marketing forces will acquire the capacity to define • and analyze trends at every level, from the entire market down to age‐groups‐by‐region or any other fine level that matters. And they will ultimately control the level of precision of their forecasting, because they will control the quality of the data going in and the resolution of the measurements.

Company• ‐wide performance monitoring. This same style of analysis can be applied at the department level, business unit level, and company‐wide. We can develop, and continually refine, metrics that will allow us to continuously evaluate our company’s performance.

User• ‐defined, user‐controlled reporting. This one is highlighted to make sure we don’t jump over it, because it sounds mundane. But there is no overstating the incredible value of this capability and, moreover, it’s the justification we’ve been looking for.

8.13 Making Our CaseConsider the operational reporting systems we have in place. We need look no further than Orders for an example. A mass of reports issues from this system going to many different individuals in a number of departments. It’s often the case that any one of these users is grabbing transactional orders, data from Order History, as well as data from different databases altogether (Customer tables, etc.) in order to assemble the information required. What’s wrong with this? Well, in the first place, since the reports are largely static, and since the information is often in different databases requiring multiple queries, it’s expensive to operate this way. And this is before we even factor in the cost of developing new reports.

Our choice, then, is one‐for‐one application investment vs. many‐for‐one. That is, we need to make clear to our executive decision‐makers that the ultimate take‐home point of data warehouse implementation is that we’re giving our user community a single application that yields the results of many applications. The money we’d spend implementing one major new application can be spent on a data warehouse, and a great many powerful applications will spring up. Isn’t this the kind of bargain we all shop for?

This, incidentally, leads us to a fourth benefit, tailor‐made for our peers: A data warehouse enables an Executive Information System. An EIS is an application that delivers, in digest form, any information an executive needs for decision support. The philosophy behind such systems is that typical executives do not fall among the users described above: There’s no way they’re going to go to four or five sources to cobble together the data they need to do their jobs. They want whatever they can get, information‐wise, by going to as little trouble as possible to get it. For such users, a data warehouse is a dream comes true.

8.14 Who Gets the Bill?If we can’t sell this, then they just aren’t buying. A data warehouse is going to ultimately be a bargain and a powerful strategic tool that will give our company a competitive edge. And while we can’t offer a very


180

SummaryWhether a company subscribes to a specific management methodology or not, the benefits of monitoring and • communicating performance results cannot be argued against.

The initiatives that link up to the overall strategy can drill further down into more specific and detailed operational • processes.

A spreadsheet is a computer application with tools that increase the user’s productivity in capturing, analyzing, • and sharing tabular data sets.

Spreadsheets developed as computerised simulations of paper accounting worksheets.•

OLAP means many different things to different people, but the definitions usually involve the terms “cubes”, • “multidimensional”, “slicing & dicing” and “speedy-response”.

The trend towards companies storing more & more data about their business shows no sign of stopping.•

There are typically four types of presentation media: dashboards, visual analysis tools, scorecards, and • reports.

Data mining tools can answer business questions that traditionally were too time consuming to resolve.•

Data mining techniques are the result of a long process of research and product development.•

A data warehouse (DW) is a database used for reporting and analysis.•

A data warehouses constructed from integrated data source systems does not require ETL, staging databases, • or operational data store databases.

Decision Engineering is a framework that unifies a number of best practices for organisational decision • making.

Process mining provides an important bridge between data mining and business process modelling and • analysis.

Business performance management (BPM) is a key business initiative that enables companies to align strategic • and operational objectives.

The local information system can be defined as ‘A system to support the administration of environment, resources • and planning tasks by making information available to executive and public through co-ordinating existing systems and investments by a common architecture.

A BI solution is often built as an add• ‐on component to the transactional system that generates much of the data.

EAI can be accomplished in many ways, from the database level on up to Web Services, or with simple feeds • from the system of record.

181

ReferencesSurhone, L., Tennoe, M. & Henssonow, S., 2011. • Business Intelligence Tools, VDM Verlag Dr. Mueller AG & Co. Kg Publication.

Rud, O., 2009. • BusinessIntelligenceSuccessFactors:ToolsforAligningYourBusinessintheGlobalEconomy, John Wiley & Sons Publication.

Quinn, K., • How Business Intelligence Should Work, [Online] Available at: <http://www.ebsolute.com/web/doc/InformationManagementWP_BI.pdf > [Accessed 26th April 2012].

Decision engineering• , [Online] Available at: < http://www.saylor.org/site/wp-content/uploads/2011/06/Decision-Engineering-.pdf> [Accessed 26 April 2012].

2011. • What Tools to use for Data Warehousing and Business Intelligence?, [Video Online] Available at: < http://www.youtube.com/watch?v=9s24DE4Yvfk> [Accessed 26 April 2012].

2011. • Microsoft Business Intelligence Tools 2011, [Video Online] Available at: <http://www.youtube.com/watch?v=W6M4v7X7irU> [Accessed 26 April 2012].

Recommended ReadingGonzales, M., 2003. • IBM Data Warehousing: With IBM Business Intelligence Tools, Wiley Publication.

Stackowiak, R., Rayman, J. & Greenwald, R., 2007. • ORACLEDATAWAREHOUSING&BUSINESSINTELLIGENCESO, John Wiley & Sons Publication.

Fernandez, I. & Sabherwal, R., 2010. • Business Intelligence, John Wiley & Sons Publication.


182

Self Assessment________displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns.1.

Spreadsheeta. Filesb. Data sheetc. Data warehoused.

___________is a framework that unifies a number of best practices for organisational decision making.2. Decision makinga. Decision Engineeringb. Complexity ceilingc. Decision support systemd.

___________ refers to technologies enabling the real-time monitoring of business processes.3. Decision support systema. Business intelligenceb. Decision Engineeringc. Business Activity Monitoringd.

____________can revolutionise business analysis and performance monitoring.4. Enterprise Application Integrationa. Company Wide Coordinationb. Strategic business informationc. Push/Pull Modeld.

_________refers to technologies to process large amounts of events, utilising them to monitor, steer and optimise 5. the business in real time.

Complex Event Processinga. Corporate Performance Managementb. Business Process Improvementc. Total Quality Managementd.

_________is another buzzword for measuring the performance of a process or organisation.6. Complex Event Processinga. Corporate Performance Managementb. Business Process Improvementc. Total Quality Managementd.

_________provides an important bridge between data mining and business process modelling.7. Analysisa. Business Intelligenceb. Data miningc. Process miningd.

183

A _________is a database used for reporting and analysis.8. databasea. data warehouseb. data martc. datad.

__________techniques are the result of a long process of research and product development.9. Data storagea. Drill downb. Data miningc. Data hidingd.

Which of the following statements is false?10. Spreadsheets share many principles and traits of databases.a. A spreadsheet is essentially just one table.b. Database is a collection of many tables.c. A modern spreadsheet file consists of single worksheets.d.


184

Case study I

Application of business intelligence tools in Education Institutes

IntroductionIn 1996, Department of Manufacturing Engineering and Engineering Management (MEEM) Laboratory of City University of Hong Kong started to develop an intranet system called “IntraMEL” for staff and students to access laboratory information such as laboratory class timetable, technicians duty schedule, and so on, in the university or at home. With ever-increasing volume of data generated, the system only manages to retrieve a fraction of information from several predefined reports.

Senior management, including laboratory manager and academic staff, also find it difficult to make additional queries without the assistance of programmers. Due to these weaknesses, MEEM decided to develop an intelligent decision support system in the department to assist the management to manage, plan, report and predict their operations more effectively and in a timely manner. This objective is to be achieved by establishing a universal Enterprise Data Model and a framework across the department and providing online analytical processing tools to help users analyze the information by themselves.

System design architectureThe intelligent decision support system provides both Windows-based (thick client) and web-based (thin client) approaches in real-time data analysis. For Windows-based solutions, a business intelligence tool called “Business Objects” is installed in a client machine that uses ODBC as a communication interface to create a real-time analytical report from data warehouse repository. For web-based solutions, the analytical results are generated from data warehouse repository through Web Intelligence Server and displayed in a web browser.

The data warehouse repository is used to hold the ‘universe’ of cleaned data, which is the heart of the intelligent decision support system. ‘Universe’ is a data model that is defined to create the query results in a Windows-based or web based solution. The repository can be housed in either Microsoft SQL Server or Oracle Database Server. The detailed design of the universe is illustrated in the case study section. The data warehouse repository finally loads data from some latency data sources such as SQL Server, Oracle, MS Access, Excel spreadsheet and so on. The architecture of the intelligent decision support system is shown in figure below.

185

Fig. 1 Architecture of the system

Workflow designIn designing an intelligent decision support system, there are three main stages listed as follows.

Loading and Cleaning Data•

Building Model•

Establishing Goal•

Stage 1: Loading and cleaning dataNormally, business data of various formats is generated from under different application programs such as spreadsheet and database systems. Therefore, to prepare data for analysis, we need to load it into data warehouse repository. In this stage, when the data is loaded from latency data sources, uniformity checks must be applied to ensure that they are within specific limit. Version checks are also performed to ensure that they are up-to date.

In addition, a few transformation scripts will be implemented to clean up unused data, divide a data field into two separated fields, and combine several fields together from data repository as shown in figures below.


186

Fig. 2 Design Workflow

Fig. 3 Transformation Script

Stage 2: Building modelAfter the data are loaded and cleaned, the ‘universe’ is formed which represents a complete and unified view of the company’s information system. It resembles what people usually call a data warehouse. However, in normal cases, it is better to partition the entire ‘universe’ into manageable pieces so that the information retained in each partition is self-sustained for a particular application. Therefore, in this stage, unnecessary information and relational tables are filtered, and some relational tables are de-normalized for effective use.

187

Stage 3: Establishing goalIn this stage, that relevant information contained in the model is converted into ‘dimensions’ and ‘measures’. Dimensions are qualitative items that end-users normally would like to query about and they usually correspond to one of the field name of a relational table. Typical examples of dimensions are country, model name, location and year. Measures are quantitative data that are used to provide figures to measure the performance of a combination of dimensions for decision-making. Typical examples of measures are sales revenue, profit value, and cost. For example, in a query, a user would like to know the total sale revenue of product A in store B of country C during the fiscal year D, where product A, store B, country C and year D are all dimensions and the sale revenue is the measure.

Case StudyIn MEEM department, the intelligent decision support system allows management to analyze information about laboratory classes, student, technician, and staff development. As of today, MEEM has launched the analysis of laboratory information and technician information for the manager and his sub-ordinates. In this case study, we discuss the steps of developing the laboratory information analysis.

In laboratory information analysis, the system helps manager and his subordinates better utilize the resources of MEEM laboratory. These analysis reports include:-

Utilization of Laboratory. To analyze the utilization of each laboratory area in different academic years.•

Utilization of Laboratory Handout. To analyze the usage of each laboratory handout by different programmes • and courses in different academic years.

It allows management and academic staff to have a better understanding of the utilization of the associated experiments in supporting the teaching activities.

Laboratory utilization by programmeTo analyze the number of laboratory classes available for different degree programmes in different academic years. It helps the management to design better programmes. In technician information analysis, the system allows the manager to allocate technical staff to support laboratory classes. The list of analysis reports include:-

Technician Workload Summary. To analyze the workload of technicians in each week. It helps the management • to balance the workload.

Technician Staff Availability. To analyze the availability of technical staff in specific time period. It helps the • management to make decisions on the approval of leave application by technical staff.

Technician’s Expertise. To analyze the technical expertise needed in supporting laboratory classes. It helps the • management to ensure the continuous growth of technical staff.

Universe and model designIn this case study, the first step is to design and build the universe of MEEM laboratory’s information system. In our case, all the information that relates to the operations of MEEM laboratory is stored in one single MS SQL database. The universe was built almost instantly by importing the database into the Business Objects environment. The model for the analysis of MEEM laboratory information is shown in figure below, which extracts that portion of relational tables from the universe containing relevant and sufficient information for the queries as suggested above.

As shown below, this model consists of eight tables from which it allows end users to make various queries to support their decision-making.


188

Fig. 4 Universe designer

MeasureAfter the universe and the model are available, the user could define the corresponding measures. As shown in figure below, for instance, the “Lab Area Utilization” is a measure, which is obtained by adding up the number of laboratory classes in the relevant tables. While the system supports a number of built-in functions such as average, summation, and minimum and so forth, the definition of correct measures relies heavily on user’ s understanding of the underlying data scheme of the universe.

Figure 5 User-variable Editor

HierarchyIn Business Objects, the definition of hierarchy is a powerful feature to enable users to perform a wide range of queries by viewing various levels of details in the data source. The ability to display underlying details of a data source is known as drilling. In figure 6, three levels are defined in the “Time Period” hierarchy, namely academic year, semester, and week number.

Therefore, in addition to the “Lab Area Utilization” in academic year 1999/2000, with a few mouse-click operations, users can drill down to analyze such utilization in more details by different dimensions such as semester week and year. Such drilling operations are deemed important to help management to identify the root cause of the results.

189

Fig. 6 Hierarchy Editor

Windows-based solutionAfter preparing all back-end jobs, users can now activate a Windows-based program to create the query results in real-time. In figure given below, when users successfully establish a connection to the universe, they can present all available dimensions and measures for a selected model in a well organized hierarchy, as shown in the left hand panel of the screen. Users can freely drag a combination of dimensions and measures to the right hand panel and the corresponding results would automatically be displayed. Moreover, users can view the results in whatever format they like and continue to drill down the hierarchy to further analyzes the results.

Fig. 7 Windows-Based Solution

Web-enabled solutionWith the advent of web technologies, Business Objects allows users to perform the analysis via a web browser. When users logon to the web server, the system automatically downloads a small Java Applet program onto the client machine so as to access the back-end data warehouse repository. After the Java Applet is installed, users can query the universe and models in almost the same way as from the Win-dows-based program. It is illustrated in figure 8.


190

Fig. 8 Web-enabled Solution

Another advantage of using the web-enabled solution is availability. Since the system is web-enabled, users can access the system at anytime from anywhere with a web browser. Most powerful features including drill-down are also supported in a web-enabled system. As shown in figure below, when users change the display mode into 3D bar chart, they can click any of the 3D bars to drilldown to view the associated figures for further analysis.

Fig. 9 3D Bar Chart

ConclusionRunning a university has no big difference in running private enterprises in today’s competitive world. Management is striking for operation excellence from time to time. To achieve this, meaningful performance measures are to be established and mechanisms are to be developed to turn data into information rapidly. In this paper, we demonstrate that with the support of commercial business intelligence tools such as Business Objects, a typical academic department like MEEM could develop an on-line business decision support system to empower the staff to review, monitor, and react promptly to their operations from time to time.

191

(Source: Ricky, W. H., Derek, Y., Ngan, T.W. & Martin, C.Y., APPLICATIONOFBUSINESSINTELLIGENCETOOLS IN EDUCATION INSTITUTES: A CASE STUDY OF MEEM LABORATORY, [pdf] Available at: <http://www.cityu.edu.hk/me/e-kit/downloads/bi.pdf> [Accessed 16 May 2012]).

QuestionsState the role of data warehouse repository in intelligent decision support system.1.

Answer: The data warehouse repository is used to hold the ‘universe’ of cleaned data, which is the heart of the intelligent decision support system. ‘Universe’ is a data model that is defined to create the query results in a Windows-based or web based solution. The repository can be housed in either Microsoft SQL Server or Oracle Database Server. The data warehouse repository finally loads data from some latency data sources such as SQL Server, Oracle, MS Access, Excel spreadsheet and so on.

Define Dimension and Measure.2.

Answer: Dimensions: Dimensions are qualitative items that end-users normally would like to query about and they usually correspond to one of the field name of a relational table. Typical examples of dimensions are country, model name, location and year.

Measures: Measures are quantitative data that are used to provide figures to measure the performance of a combination of dimensions for decision-making. Typical examples of measures are sales revenue, profit value, and cost.

Which tool is needed for windows-based solutions?3.

Answer: A business intelligence tool called “Business Objects” is needed for windows-based solutions.


192

Case Study II

Automobile Manufacturer

This success story highlights how a well-known automobile manufacturer used business intelligence at every level of the organisation to drive new efficiencies, and reach strategic goals. The company was experiencing slipping revenues for each of the cars it produced, and shareholders were demanding increased profitability. But, driving profitability is a multifaceted goal. One way to achieve it is to raise prices. However, in an industry as competitive as automobiles, where buyers are very cost-conscious, that simply wasn’t an option.

Reducing costs is another way to boost profits. But, there are many factors that contribute to overhead expenses manufacturing, payroll, warranties, etc. The strategy map below depicts some of the key initiatives that were agreed upon by senior management, and communicated throughout the company’s ranks.

The strategies depicted here are significantly simplified to ease understanding of the concepts presented here. In real-world scenarios, these maps will have numerous layers and components. Analysis will typically point to many operational areas that are in need of improvement. Company management will then collaborate to prioritise these operational initiatives. In some cases, several operational initiatives will be underway at the same time. And in other situations, management will undertake smaller, less critical initiatives that can be accomplished quickly, instead of larger, more strategic ones that may take a longer time to complete.

The map helped the company’s business analysts focus their efforts on the right cost-cutting measures. During the course of their analysis activities, many questions arose. How could the company cut costs, while maintaining high levels of product quality and support? Perhaps purchasing cheaper parts from different suppliers would increase profits by reducing the cost of goods sold, but this may compromise quality and result in higher warranty expenses down the road.

193

An analyst came up with the idea of increasing profitability by driving down warranty costs. This could be accomplished by providing dealerships with fast and simple access to timely information. Operating expenses could be dramatically reduced, without sacrificing quality. And, the cost of building and deploying such a solution was minimal, since the supporting technology infrastructure was already in place.

The resulting application enabled dealerships to more closely monitor and manage warranty service, so practices could be optimised (for example, reducing the number of repeat service appointments, and eliminating excessive service) to increase cost-efficiency. Additionally, since the dealerships are not owned by the manufacturer (they are partners of the company who sell and service the products), the application allowed competing dealerships to track each other’s performance levels.

All performance-related data, including dealership ratings and comparisons, is communicated and shared via the Internet-based warranty management system. Red flag alerts for repeat services and other critical issues dynamically notify both the manufacturer and the dealer of potential problems. When needed, inspectors and trainers are dispatched to delinquent dealerships to address ongoing troubles.

Today, more than 60,000 employees from service managers to mechanics at 14,000 dealerships use the application daily. As a result, the company has cut as much as $40 to 60 million dollars in warranty costs, a reduction that contributed significantly to their goal of higher profits.

(Source: Quinn, K., How Business Intelligence Should Work [pdf] Available at: <www.ebsolute.com/web/doc/InformationManagementWP_BI.pdf> [Accessed 16 May 2012]).

QuestionsCreate a strategic map for a cloth mill.1. What profit the automobile manufacturer got with Business Intelligence?2. What are the factors which help to increase profit?3.


194

Case Study III

Commercial Airline

The pressures of competition, combined with higher fuel prices, were causing a major airline to lose money. Those at the highest levels of the organisation gave this charge to company employees find a way to regain profitability. But limited budgets made the aggressive advertising and marketing campaigns needed to boost sales impossible. So, profits would need to be driven through improved operational efficiencies.

The strategy map below depicts part of the company’s strategy to improve profit margins. The majority of the strategy is aimed at increasing productivity. An analyst at the airline noticed that many flights, which were typically filled to capacity, were not selling all their seats. Further investigation showed that trivial maintenance issues, such as a faulty seat-back table or a torn seat cushion, were preventing the airline from selling tickets for those seats. But, maintenance workers weren’t always notified of these problems in a timely manner, because repair-related information was distributed across three disparate applications.

What the airline needed was real-time information that would expedite service to planes between flights. Developers built a report that combines data from the distinct operational sources: The primary maintenance system, which contains

Information about plane problems such as broken seats•

The parts inventory system, which stores data about the location of replacement parts needed for repairs•

The plane routing system, where scheduling information resides•

This single report keeps all maintenance workers informed about which planes need repairs, which parts are required to make those repairs, and where those planes are. This enables them to fix each problem as soon as possible. And, by ensuring that maintenance issues are fixed in the timeliest fashion possible, the airline can increase seat sales. As a result of this single report, as well as other operational initiatives, the company quickly returned to its previous levels of profitability.

195

The strategies depicted here are significantly simplified to ease understanding of the concepts presented in this paper. In real-world scenarios, these maps will have numerous layers and components. Analysis will typically point to many operational areas that are in need of improvement. Company management will then collaborate to prioritise these operational initiatives. In some cases, several operational initiatives will be underway at the same time. And in other situations, management will undertake smaller, less critical initiatives that can be accomplished quickly, instead of larger, more strategic ones that may take a longer time to complete.

(Source: Quinn, K., How Business Intelligence Should Work, [Online] Available at: <www.ebsolute.com/web/doc/InformationManagementWP_BI.pdf> [Accessed 16 May 2012]).

QuestionsState the factors responsible for the financial loss of commercial airlines.1. Create a strategic map for the private transport company.2. Create a report for this transport company.3.


196

Bibliography

References2006. • Datawarehouse DW BI Introduction [Video Online] Available at: <http://www.youtube.com/watch?v=zDk4yPL6Adc> [Accessed 3 May 2012].2008. • Lecture - 30 Introduction to Data Warehousing and OLAP [Video Online] Available at: <http://www.youtube.com/watch?v=m-aKj5ovDfg> [Accessed 3 May 2012].2009. • Business Intelligence Reporting & Analytics Tool: SQL Power Wabit, [Video Online] Available at: <http://www.youtube.com/watch?v=vkz1aC3rq9o&feature=results_main&playnext=1&list=PLA9A611916134446A> [Accessed 27 April 2012].

2009. History of Business Intelligence, [Video Online] Available at: <http://www.youtube.com/• watch?v=_1y5jBESLPE> [Accessed 27 April 2012].

2010• . Data Warehouse Basics, [Video Online] Available at: <http://www.youtube.com/watch?v=EtaUzQrAPKE&feature=related> [Accessed 27 April 2012].

2010. • Decision Support Systems - An Introduction, [Video Online] Available at: <http://www.youtube.com/watch?v=la7MdnrlLZc > [Accessed 3 May 2012].

2010. • Decision Support Systems, [Video Online] Available at: <http://www.youtube.com/watch?v=aWc4H8cfGE8> [Accessed 3 May 2012].2010. What is Business Intelligence?, [Video Online] Available at: <http://www.youtube.com/• watch?v=0aHtHl-jcAs> [Accessed 27 April 2012].2011, 1.2.1 BI Tools and Processes, [Video Online] Available at: <http://www.youtube.com/• watch?v=ZpBtxKf20zY> [Accessed 27 April 2012].2011. • Business Analytics - Turning Data Into Insight, [Video Online] Available at: <http://www.youtube.com/watch?v=6jDjeNJrN14 > [Accessed 24 April 2012].2011. • Microsoft Business Intelligence Tools 2011, [Video Online] Available at: <http://www.youtube.com/watch?v=W6M4v7X7irU> [Accessed 26 April 2012].2011. • Opensource BI tool-Pentaho, [Video Online] Available at: <http://www.youtube.com/watch?v=J5eVY3o_zGw> [Accessed 27 April 2012].2011. • Trends in Business Analytics, [Video Online] Available at: <http://www.youtube.com/watch?v=nfMnILQVZXo> [Accessed 24 April 2012].2011. • What is OLAP?, [Video Online] Available at: <http://www.youtube.com/watch?v=2ryG3Jy6eIY> [Accessed 3 May 2012].2011. • What Tools to use for Data Warehousing and Business Intelligence? [Video Online] Available at: < http://www.youtube.com/watch?v=9s24DE4Yvfk> [Accessed 26 April 2012].Biere, M., 2003. • Business Intelligence for the Enterprise, Prentice Hall Professional Publication.Business Intelligence – The Beginning• , [Online] Available at: <http://www.few.vn.nl> [Accessed 25 April 2012].Business Intelligence Components• , [Online] Available at: <download.microsoft.com/.../Business%20Intelligence%20components> [Accessed 27 April 2012].

CHAPTER 1 Introduction to OLAP• , [Online] Available at: <http://www.mhprofessional.com/downloads/products/0071621822/0071621822%20_Chap01.pdf> [Accessed 3 May 2012].

Clausen, N., 2009. • Open Source Business Intelligence, 2nd ed. BoD – Books on Demand Publication.

197

Corporate Information Factory• , [Online] Available at: <people.stfx.ca/nfoshay/.../Corporate%20Information%20Factory.ppt> [Accessed 3 May 2012].

DATAWAREHOUSING Basics• , 2009. [Video Online] Available at: <http://www.youtube.com/watch?v=eiRhRxPuEU8> [Accessed 3 May 2012].

Decision engineering• , [Online] Available at: <http://www.saylor.org/site/wp-content/uploads/2011/06/Decision-Engineering-.pdf> [Accessed 26 April 2012].

Decision support systems• , [Online] Available at: <http://cstl-hcb.semo.edu/eom/iebmdssrwweb.pdf > [Accessed 3 May 2012].

e Corporate Information Factory• , [Online] Available at: <http://www.information-management.com/issues/19991201/1667-1.html> [Accessed 3 May 2012].

EIAO, • A Survey of Open Source Tools for Business Intelligence [Online] Available at: <http://eiao.net/publications/EIAO_ddw06.pdf > [Accessed 27 April 2012].

Flynn, R. & Druzdzel, M., • Decision Support Systems, [Online] Available at: <http://www.pitt.edu/~druzdzel/psfiles/dss.pdf > [Accessed 3 May 2012].

Haag, 2005. • Business Driven Technology W/Cd, Tata McGraw-Hill Education Publication.

Hartenauer, J., 2007. • Introduction to Business Intelligence, VDM Verlag Publication.

HURBEAN LUMINITA, • BUSINESSINTELLIGENCE:APPLICATIONS,TRENDS,ANDSTRATEGIES, [Online] Available at: <http://anale.feaa.uaic.ro/anale/resurse/46_Hurbean_L_-_Business_intelligence-applications, trends_and_strategies.pdf > [Accessed 24 April 2012].

Inmon, W., Imhoff, C. & Sousa, R., 2001. • Corporate Information Factory, 2nd ed. John Wiley & Sons Publication.

Introduction to OLAP (A beginner’s guide to OLAP & the concepts behind it)• , [Online] Available at: <http://resources.businessobjects.com/support/communitycs/TechnicalPapers/si_intro_olap.pdf > [Accessed 3 May 2012].

Janert, P., 2010. • Data Analysis with Open Source Tools, O’Reilly Media, Inc. Publication.

Loshin, D., 2003. • Business Intelligence:TheSavvyManager’sGuide,GettingOnboardwithEmerging IT, Morgan Kaufmann Publication.

Paredes, J., 2009. • The Multidimensional Data Modeling Toolkit: Making Your Business Intelligence Application, John Paredes Publication.

Pechenizkiy, M., 2006. • Lecture 2 Introduction to Business Intelligence, [Online] Available at: <http://www.win.tue.nl/~mpechen/courses/TIES443/handouts/lecture02.pdf> [Accessed 27 April 2012].

Quinn, K., • How Business Intelligence Should Work, [Online] Available at: <http://www.ebsolute.com/web/doc/InformationManagementWP_BI.pdf> [Accessed 26 April].

Reinschmidt, J., • BusinessIntelligenceCertificationGuide, [Online] Available at: <capstone.geoffreyanderson.net/export/.../sg245747.pdf - United States> [Accessed 27 April 2012].

Roebuck, K., 2011. • BusinessAnalytics:High-ImpactStrategies-WhatYouNeedtoKnow:Definitions,Adoptions,Impact,Benefits,Maturity,Vendors, Lightning Source Incorporated Publication.

Rud, O., 2009. • BusinessIntelligenceSuccessFactors:ToolsforAligningYourBusinessintheGlobalEconomy, John Wiley & Sons Publication.

Sauter, V., 2010. • Decision Support Systems for Business Intelligence, 2nd ed. John Wiley & Sons Publication.

Schlukbier, A., 2007. Implementing • EnterpriseDataWarehousing:AGuide for Executives, Lulu.com Publication.


198

Schrader, M., Vlamis, D., Nader, M., Collins, D., Claterbos, C., Campbell, M. & Conrad, F., 2009. • Oracle Essbase&OracleOLAP:TheGuide toOracle’sMultidimensionalSolution, McGraw-Hill Prof Med/Tech Publication.

Simon, A. & Steven, S., 2001. • Data Warehousing And Business Intelligence For e-Commerce, Morgan Kaufmann Publication.

Surhone, L., Tennoe, M. & Henssonow, S., 2011. • Business Intelligence Tools, VDM Verlag Dr. Mueller AG & Co. Kg Publication.

Thomsen, C. & Pedersen, T., 2008. • A Survey of Open Source Tools for Business Intelligence [Online] Available at: <http://vbn.aau.dk/ws/files/14833824/DBTR-23.pdf> [Accessed 27 April 2012].

Turban, 2008. • Decision Support And Business Intelligence Systems, 8/E, Pearson Education India Publication.

Yeoh, W., Koronios, A. & Gao, J., 2008. • Managing the Implementation of Business Intelligence Systems: A Critical Success Factors Framework, [Online] Available at: <http://im1.im.tku.edu.tw/~cjou/bi2009/1.pdf> IGI PUBLISHING [Accessed 24 April 2012].

Recommended Reading

Agnew, P. & Silverstone, L., 2009. • The Data Model Resource Book: Universal Patterns For Data Modeling, John Wiley & Sons Publication.

Becerra-Fernandez, I. &Sabherwal, R., 2010. • Business Intelligence, John Wiley & Sons Publication.

Berson, 2004. • Data Warehousing, Data Mining, & Olap, Tata McGraw-Hill Education Publication.

Chao, L., 2009. • Utilizing OpenSourceToolsforOnlineTeachingandLearning:ApplyingLinuxTechnologies, Idea Group Inc (IGI) Publication.

Evans, J., 2012. • Business Analytics, Pearson College Division Publication.

Fernandez, I. & Sabherwal, R., 2010. • Business Intelligence, John Wiley & Sons Publication.

Gonzales, M., 2003. • IBM Data Warehousing: With IBM Business Intelligence Tools, Wiley Publication.

Howson, C., 2007. • Successful Business Intelligence, Tata McGraw-Hill Education Publication.

Information Resources Management Association, • 2010. Information Resources Management: Concepts, Methodologies, Tools and Applications, Idea Group Inc (IGI) Publication.

Jank, W., 2011. • Business Analytics for Managers, Springer Publication.

Panos, V., Vassiliou, Y., Lenzerini, M. & Jarke, M., 2003. • Fundamentals of Data Warehouses, 2nd ed. Springer Publication.

Paredes, J., 2009. • The Multidimensional Data Modeling Toolkit: Making Your Business Intelligence Applicatio, John Paredes Publication.

Ponniah, P., 2011. • Data Warehousing Fundamentals for It Professionals, 2nd ed. John Wiley & Sons Publication.

Prabhu, C. S. R., 2004. • Data Warehousing: Concepts, Techniques, Products and Applications, 2nd ed. PHI Learning Pvt. Ltd.

Pujari, A., 2001. • Data Mining Techniques, 4th ed. Universities Press Publication.

Scheps, S., 2008. • Business Intelligence For Dummies, John Wiley & Sons Publication.

Stackowiak, R., Rayman, J. & Greenwald, R., 2007. • ORACLEDATAWAREHOUSING&BUSINESSINTELLIGENCESO, John Wiley & Sons Publication.

Thierauf, R., 2001. • Effective Business Intelligence Systems, Greenwood Publishing Group Publication.

199

Thomson, E., 2002. • Olap Solutions: Building Multidimensional Information Systems, 2nd ed. Wiley Publication.

Turban, E., Sharda, R. & Delen, D., • Decision Support and Business Intelligence Systems, Pearson Education India Publication.

Vercellis, C., 2011. • Business Intelligence: Data Mining and Optimization for Decision Making, John Wiley & Sons Publication.

Whitehorn, M., 1999. • Business Intelligence: The IBM Solution, Springer Publication.

Wiley, G. & Thorlund, J., 2010. • Business Analytics for Managers: Taking Business Intelligence Beyond Reporting, John Wiley & Sons Publication.

Wise, L., 2012. • Using Open Source Platforms for Business Intelligence, Elsevier Science Limited Publication.


200

Self Assessment Answerers

Chapter Ic1. a2. b3. d4. a5. a6. c7. b8. d9. c10.

Chapter IId1. b2. a3. c4. a5. d6. b7. d8. a9. c10.

Chapter IIIb1. a2. c3. b4. c5. d6. d7. a8. b9. a10.

201

Chapter IVd1. a2. b3. c4. d5. c6. a7. b8. a9. a10.

Chapter Va1. b2. d3. b4. d5. c6. c7. b8. a9. d10.

Chapter VIa1. c2. a3. b4. d5. d6. a7. b8. c9. a10.


202

Chapter VIIc1. d2. c3. a4. b5. d6. a7. b8. c9. a10.

Chapter VIIIa1. b2. d3. c4. a5. b6. d7. b8. c9. d10.

203

business intelligence toolsjnujprdistance.com/assets/lms/lms jnu/mba/mba - it management… · 3.8...

Documents