warehousing on the web webhouse. why utilize the web? n what is the data webhouse n managing...

33
Warehousing on the Web Warehousing on the Web Webhouse Webhouse

Post on 22-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Warehousing on the WebWarehousing on the Web

WebhouseWebhouse

Why Utilize the Web?Why Utilize the Web?

What is the data WebhouseWhat is the data Webhouse Managing clickstreamsManaging clickstreams WWW todayWWW today ROIROI DSSDSS

Data WebhouseData Webhouse

Defined by Ralph KimballDefined by Ralph Kimball Two distict focusesTwo distict focuses

• Bringing the web to the warehouseBringing the web to the warehouse– Clickstream data as a source of informationClickstream data as a source of information

• Bringing existing data warehouses to webBringing existing data warehouses to web– Fully distributed environmentFully distributed environment

Required CapabilitiesRequired Capabilities Capture clickstream logs and convert to tables for analysisCapture clickstream logs and convert to tables for analysis Merge customer demographic and account info with Merge customer demographic and account info with

aboveabove Interpret customer paths in websiteInterpret customer paths in website Identify abandoned sessionsIdentify abandoned sessions Use dw to drive customer responses appearing on your Use dw to drive customer responses appearing on your

websitewebsite DW querying and reporting available through web DW querying and reporting available through web

browsersbrowsers Attach multimedia to DWAttach multimedia to DW DW securityDW security

Architecture – Web to Architecture – Web to WarehouseWarehouse

Beyond comprehensive snapshot of Beyond comprehensive snapshot of business on real-time basis also want business on real-time basis also want knowledge of customer behaviorknowledge of customer behavior

Extended design factorsExtended design factors• Timliness – real-timeTimliness – real-time• Data volume – no upper limitData volume – no upper limit• Response time – less than 10 secondsResponse time – less than 10 seconds

Hot Response CacheHot Response Cache

A file server holding complex file objectsA file server holding complex file objects As a file server it is an I/O engine (bandwidth)As a file server it is an I/O engine (bandwidth) Must hold objects which will be requestedMust hold objects which will be requested Security responsibility of requesting serverSecurity responsibility of requesting server Extension of original operational data store Extension of original operational data store

(ODS)(ODS) Does not physically speed up database Does not physically speed up database

creates illusion by storing predictable creates illusion by storing predictable answersanswers

Who are our users?Who are our users?

TraditionalTraditional• Power usersPower users

– need database connectivityneed database connectivity

• Analysts Analysts – want to manipulate existing datawant to manipulate existing data

• Report viewersReport viewers– view standardized reportsview standardized reports

WebWeb• Our customersOur customers• Our business partners Our business partners • Our employeesOur employees

ClickstreamsClickstreams

Clickstream not another data sourceClickstream not another data source• Distributed nature leads to multiple data sources which Distributed nature leads to multiple data sources which

require synchronizationrequire synchronization• Multiple partiesMultiple parties• More than a dozen log file formats for capturing clickstream More than a dozen log file formats for capturing clickstream

datadata• Search specificationSearch specification

Basic form of clickstream data statelessBasic form of clickstream data stateless• Log shows isolated page retrieval eventLog shows isolated page retrieval event

Clickstream data anonymousClickstream data anonymous Todays PromotionsTodays Promotions

• Clickthroughs and referrals as a revenue sourceClickthroughs and referrals as a revenue source

ClickstreamsClickstreams

Clickstream post-processor – receives Clickstream post-processor – receives raw long data from web server and raw long data from web server and normalizes it into a format which can be normalizes it into a format which can be combined with application derived data combined with application derived data for insertion into dwfor insertion into dw

Todays PromotionsTodays Promotions• Clickthroughs and referrals as a Clickthroughs and referrals as a

revenue sourcerevenue source

Why Bring DW to Web?Why Bring DW to Web?

Primary function of dw to publish Primary function of dw to publish information – web good partnerinformation – web good partner

Need distrnuted dw – web provides Need distrnuted dw – web provides universal connectivityuniversal connectivity

Universal front-end – web browserUniversal front-end – web browser

Web Pushes Data Warehouse Web Pushes Data Warehouse

User interface effectiveness measurableUser interface effectiveness measurable Queries and updates mixedQueries and updates mixed Speed expected – 10 second ruleSpeed expected – 10 second rule Global Global

• 27 X 7 expected27 X 7 expected• International characters, dates, addressesInternational characters, dates, addresses

Expanded multimediaExpanded multimedia• Animation, zoomable images, maps, video clipsAnimation, zoomable images, maps, video clips• Need material in digital formNeed material in digital form• Enterprise information portal will require items to be Enterprise information portal will require items to be

searchablesearchable

Web Pushes Data WarehouseWeb Pushes Data Warehouse

Mass customizationMass customization• Dynamically created web pages – XMLDynamically created web pages – XML

Fully distributedFully distributed• Linking together all the data martsLinking together all the data marts

Security and PrivacySecurity and Privacy• Publish only to those who need to knowPublish only to those who need to know• User profiles and access profiles defined in one User profiles and access profiles defined in one

placeplace• Full-time expert security personFull-time expert security person

Second Generation User Second Generation User Interface GuidelinesInterface Guidelines

Near- instantaneous performanceNear- instantaneous performance Website DesignWebsite Design

• Design for lowest common denominatorDesign for lowest common denominator• Measure page performance on a continuous basisMeasure page performance on a continuous basis• Paint navigation buttons immediatelyPaint navigation buttons immediately• Disclose content progressivelyDisclose content progressively• Implement page cachingImplement page caching• Cache data, reportsCache data, reports• Improve web server bandwidthImprove web server bandwidth• Improve server throughputImprove server throughput

Second Generation User Second Generation User Interface GuidelinesInterface Guidelines

Data Webhouse designData Webhouse design• Adapt all web design responses Adapt all web design responses • Select appropriate DBMS software – Select appropriate DBMS software –

dimensional models, OLAPdimensional models, OLAP• Use indexes, aggregationsUse indexes, aggregations• Partition filesPartition files• Increase RAMIncrease RAM• Use parallel processingUse parallel processing

Meet User ExpectationsMeet User Expectations

Website designWebsite design• Site navigation choicesSite navigation choices• Help choicesHelp choices• Communication with various groups – Communication with various groups –

response must be assuredresponse must be assured• Headlines serious and define contentHeadlines serious and define content• Indicate off-screen materialIndicate off-screen material• Survey customer needs and wantsSurvey customer needs and wants

Meet User ExpectationsMeet User Expectations

Data Webhouse designData Webhouse design• Report libraryReport library• Folder of previous queries, reports …Folder of previous queries, reports …• Dimension browser – viewing dimension Dimension browser – viewing dimension

can assist report creationcan assist report creation• Business metadata interface –understand Business metadata interface –understand

organizations data assetsorganizations data assets

Streamline ProcessStreamline Process

Business processes designed from Business processes designed from ground up to work seamlessly on webground up to work seamlessly on web

Website designWebsite design• Reengineer to streamline process and Reengineer to streamline process and

make navigation easier, uniform interfacesmake navigation easier, uniform interfaces• Remove barriers to reaching pageRemove barriers to reaching page• Minimize clicks and new windowsMinimize clicks and new windows• Allow interruption and returnAllow interruption and return

Streamline ProcessStreamline Process

Data Webhouse designData Webhouse design• Build an explicit value chain for reporting and Build an explicit value chain for reporting and

analysis around the application suite using analysis around the application suite using conformed dimensions and factsconformed dimensions and facts

• Drill across functionsDrill across functions• Single user interface for reporting against all parts Single user interface for reporting against all parts

of businessof business• Master report library and FAQsMaster report library and FAQs• Single login and single console access to Single login and single console access to

webhousewebhouse

Reassure UsersReassure Users

Website DesignWebsite Design• Map of processesMap of processes

Data Webhouse designData Webhouse design• Provide status and lineage of current dataProvide status and lineage of current data• Provide status of running reportsProvide status of running reports• Active notificationActive notification• Allow for entry of NA if data not availableAllow for entry of NA if data not available• Time stamped dimensionsTime stamped dimensions• Time stamped reportsTime stamped reports

Allow Problem ResolutionAllow Problem Resolution

Website designWebsite design• Allow backtracking, rollback, play forwardAllow backtracking, rollback, play forward• Keep old transactionsKeep old transactions• Easy error reportingEasy error reporting• Acknowledge, track and follow-up all user inputs, show Acknowledge, track and follow-up all user inputs, show

wait timewait time• Assist searchingAssist searching

Data Webhouse designData Webhouse design• Provide adequate end user supportProvide adequate end user support• Show aggregates in use and availableShow aggregates in use and available• Show system load and percent completedShow system load and percent completed

Build TrustBuild Trust

Clearly state and observe website’s Clearly state and observe website’s policies for using customer’s identitypolicies for using customer’s identity

Website designWebsite design• Do not abuse privacyDo not abuse privacy• Link to privacy statementLink to privacy statement• Use friendly pictures of peopleUse friendly pictures of people• Distinguish between ad content and Distinguish between ad content and

editorial contenteditorial content

Build TrustBuild Trust

Data Webhouse designData Webhouse design• Two-factor securityTwo-factor security

– What you know – passwordWhat you know – password– What you posses – tokenWhat you posses – token

• Track changes in employee and contractor Track changes in employee and contractor statusstatus

• Create and enforce roles for employees, Create and enforce roles for employees, contractors and customerscontractors and customers

• Manage webhouse security directlyManage webhouse security directly

Provide Communication HooksProvide Communication Hooks

Website designWebsite design• Provide useful links to others – internal and Provide useful links to others – internal and

externalexternal• Remove links that invalidate the “back” Remove links that invalidate the “back”

buttonbutton• Use copyable URLsUse copyable URLs• Use URL as medium of distributionUse URL as medium of distribution

Advantages of Web Today 1998 Advantages of Web Today 1998 20002000

Immediate worldwide accessImmediate worldwide access Centralized management - Centralized management -

DecentralizedDecentralized Thin clientThin client Multi-platform (client and server) - Multi-platform (client and server) -

DistributedDistributed Little or no software distribution - Little or no software distribution -

DownloadsDownloads A+A+

Disadvantages of Web TodayDisadvantages of Web Today 1998 1998 20002000

Immature technology - Immature technology - TeenagerTeenager Security - Security - SolutionsSolutions Speed restricted by bandwidth - data Speed restricted by bandwidth - data

and logic must both travel across and logic must both travel across internetinternet

Design limited to least common Design limited to least common denominator or access restricted to denominator or access restricted to specific browserspecific browser

VulnerabilitiesVulnerabilities

Physical assetsPhysical assets Information assetsInformation assets

• thefttheft• modificationmodification

Software assetsSoftware assets Ability to conduct businessAbility to conduct business

Web ArchitectureWeb Architecture•Browser•Applets/ActiveX•Email•Spreadsheet•Word-processingCommunication layer

(network/internet)

Data Warehouse - Relational Database

ApplicationApplication

Analysis/ Graphics Report SQL statistics Writer Query

Multidimensional Summary/AlternativeDatabase Relational Tables

OLAP Server

Thin Client

Internet Server

Database Servers

Business Management through Business Management through InformationInformation

Analysis of historical recordsAnalysis of historical records• order processing, inventory levels, shipments, order processing, inventory levels, shipments,

receivables, customer history, etc.receivables, customer history, etc. Goals include:Goals include:

• Measures of efficiencyMeasures of efficiency• Anticipate changes (planning and forecasting)Anticipate changes (planning and forecasting)• Make adjustmentsMake adjustments• Integration of model and control functionIntegration of model and control function

Rule-Based ManagementRule-Based Management

Create Strategic rulesCreate Strategic rules• IF market demand increasesIF market demand increases

THEN implement marketing campaign A3 THEN implement marketing campaign A3• IF profit margin drops below value XIF profit margin drops below value X

THEN adjust overhead by … THEN adjust overhead by … Must not forget alert rulesMust not forget alert rules

• If unanticipated condition, then notify CFOIf unanticipated condition, then notify CFO Must not be too reactiveMust not be too reactive

• would cause thrashingwould cause thrashing

OLDM Decision ProcessOLDM Decision Process Simultaneous capture of:Simultaneous capture of:

• Decision support informationDecision support information– Surveyed customer on-line in exchange for an additional discountSurveyed customer on-line in exchange for an additional discount

• with business function inputswith business function inputs Immediate computation or estimation of secondary Immediate computation or estimation of secondary

informationinformation• based on planning and forecasting rulesbased on planning and forecasting rules

Decision support information is:Decision support information is:• available on-lineavailable on-line• ready to use “ready to use “as is”as is”

ManagementDefined !

OLDM Decision ProcessOLDM Decision Process Derived data becomes control informationDerived data becomes control information Automation of analysis and decision supportAutomation of analysis and decision support

• immediately available to managementimmediately available to management Problems documented on-lineProblems documented on-line Classes of problem and corrective action Classes of problem and corrective action

codifiedcodified• problem recognitionproblem recognition• decision rulesdecision rules

OLDM Decision ProcessOLDM Decision Process

Requires four types of informationRequires four types of information• Characteristics which identify a class of Characteristics which identify a class of

problemproblem• Corrective action ( management responses Corrective action ( management responses

by problem class)by problem class)• Rules to implement actionsRules to implement actions• Record of resultRecord of result

Potential of OLDMPotential of OLDM

Better managed businessBetter managed business• knowledge asset capture and retentionknowledge asset capture and retention• consistency across enterpriseconsistency across enterprise• flexible, highly responsiveflexible, highly responsive

Close loop with customerClose loop with customer• event and market driven but controlledevent and market driven but controlled

Direct customer interactionDirect customer interaction• via web, telephone, remote connectionvia web, telephone, remote connection

Improved systems capacity planning and system Improved systems capacity planning and system managementmanagement

Re-alignment of business and ITRe-alignment of business and IT