warehousing on the web webhouse. why utilize the web? n what is the data webhouse n managing...
Post on 22-Dec-2015
215 views
TRANSCRIPT
Why Utilize the Web?Why Utilize the Web?
What is the data WebhouseWhat is the data Webhouse Managing clickstreamsManaging clickstreams WWW todayWWW today ROIROI DSSDSS
Data WebhouseData Webhouse
Defined by Ralph KimballDefined by Ralph Kimball Two distict focusesTwo distict focuses
• Bringing the web to the warehouseBringing the web to the warehouse– Clickstream data as a source of informationClickstream data as a source of information
• Bringing existing data warehouses to webBringing existing data warehouses to web– Fully distributed environmentFully distributed environment
Required CapabilitiesRequired Capabilities Capture clickstream logs and convert to tables for analysisCapture clickstream logs and convert to tables for analysis Merge customer demographic and account info with Merge customer demographic and account info with
aboveabove Interpret customer paths in websiteInterpret customer paths in website Identify abandoned sessionsIdentify abandoned sessions Use dw to drive customer responses appearing on your Use dw to drive customer responses appearing on your
websitewebsite DW querying and reporting available through web DW querying and reporting available through web
browsersbrowsers Attach multimedia to DWAttach multimedia to DW DW securityDW security
Architecture – Web to Architecture – Web to WarehouseWarehouse
Beyond comprehensive snapshot of Beyond comprehensive snapshot of business on real-time basis also want business on real-time basis also want knowledge of customer behaviorknowledge of customer behavior
Extended design factorsExtended design factors• Timliness – real-timeTimliness – real-time• Data volume – no upper limitData volume – no upper limit• Response time – less than 10 secondsResponse time – less than 10 seconds
Hot Response CacheHot Response Cache
A file server holding complex file objectsA file server holding complex file objects As a file server it is an I/O engine (bandwidth)As a file server it is an I/O engine (bandwidth) Must hold objects which will be requestedMust hold objects which will be requested Security responsibility of requesting serverSecurity responsibility of requesting server Extension of original operational data store Extension of original operational data store
(ODS)(ODS) Does not physically speed up database Does not physically speed up database
creates illusion by storing predictable creates illusion by storing predictable answersanswers
Who are our users?Who are our users?
TraditionalTraditional• Power usersPower users
– need database connectivityneed database connectivity
• Analysts Analysts – want to manipulate existing datawant to manipulate existing data
• Report viewersReport viewers– view standardized reportsview standardized reports
WebWeb• Our customersOur customers• Our business partners Our business partners • Our employeesOur employees
ClickstreamsClickstreams
Clickstream not another data sourceClickstream not another data source• Distributed nature leads to multiple data sources which Distributed nature leads to multiple data sources which
require synchronizationrequire synchronization• Multiple partiesMultiple parties• More than a dozen log file formats for capturing clickstream More than a dozen log file formats for capturing clickstream
datadata• Search specificationSearch specification
Basic form of clickstream data statelessBasic form of clickstream data stateless• Log shows isolated page retrieval eventLog shows isolated page retrieval event
Clickstream data anonymousClickstream data anonymous Todays PromotionsTodays Promotions
• Clickthroughs and referrals as a revenue sourceClickthroughs and referrals as a revenue source
ClickstreamsClickstreams
Clickstream post-processor – receives Clickstream post-processor – receives raw long data from web server and raw long data from web server and normalizes it into a format which can be normalizes it into a format which can be combined with application derived data combined with application derived data for insertion into dwfor insertion into dw
Todays PromotionsTodays Promotions• Clickthroughs and referrals as a Clickthroughs and referrals as a
revenue sourcerevenue source
Why Bring DW to Web?Why Bring DW to Web?
Primary function of dw to publish Primary function of dw to publish information – web good partnerinformation – web good partner
Need distrnuted dw – web provides Need distrnuted dw – web provides universal connectivityuniversal connectivity
Universal front-end – web browserUniversal front-end – web browser
Web Pushes Data Warehouse Web Pushes Data Warehouse
User interface effectiveness measurableUser interface effectiveness measurable Queries and updates mixedQueries and updates mixed Speed expected – 10 second ruleSpeed expected – 10 second rule Global Global
• 27 X 7 expected27 X 7 expected• International characters, dates, addressesInternational characters, dates, addresses
Expanded multimediaExpanded multimedia• Animation, zoomable images, maps, video clipsAnimation, zoomable images, maps, video clips• Need material in digital formNeed material in digital form• Enterprise information portal will require items to be Enterprise information portal will require items to be
searchablesearchable
Web Pushes Data WarehouseWeb Pushes Data Warehouse
Mass customizationMass customization• Dynamically created web pages – XMLDynamically created web pages – XML
Fully distributedFully distributed• Linking together all the data martsLinking together all the data marts
Security and PrivacySecurity and Privacy• Publish only to those who need to knowPublish only to those who need to know• User profiles and access profiles defined in one User profiles and access profiles defined in one
placeplace• Full-time expert security personFull-time expert security person
Second Generation User Second Generation User Interface GuidelinesInterface Guidelines
Near- instantaneous performanceNear- instantaneous performance Website DesignWebsite Design
• Design for lowest common denominatorDesign for lowest common denominator• Measure page performance on a continuous basisMeasure page performance on a continuous basis• Paint navigation buttons immediatelyPaint navigation buttons immediately• Disclose content progressivelyDisclose content progressively• Implement page cachingImplement page caching• Cache data, reportsCache data, reports• Improve web server bandwidthImprove web server bandwidth• Improve server throughputImprove server throughput
Second Generation User Second Generation User Interface GuidelinesInterface Guidelines
Data Webhouse designData Webhouse design• Adapt all web design responses Adapt all web design responses • Select appropriate DBMS software – Select appropriate DBMS software –
dimensional models, OLAPdimensional models, OLAP• Use indexes, aggregationsUse indexes, aggregations• Partition filesPartition files• Increase RAMIncrease RAM• Use parallel processingUse parallel processing
Meet User ExpectationsMeet User Expectations
Website designWebsite design• Site navigation choicesSite navigation choices• Help choicesHelp choices• Communication with various groups – Communication with various groups –
response must be assuredresponse must be assured• Headlines serious and define contentHeadlines serious and define content• Indicate off-screen materialIndicate off-screen material• Survey customer needs and wantsSurvey customer needs and wants
Meet User ExpectationsMeet User Expectations
Data Webhouse designData Webhouse design• Report libraryReport library• Folder of previous queries, reports …Folder of previous queries, reports …• Dimension browser – viewing dimension Dimension browser – viewing dimension
can assist report creationcan assist report creation• Business metadata interface –understand Business metadata interface –understand
organizations data assetsorganizations data assets
Streamline ProcessStreamline Process
Business processes designed from Business processes designed from ground up to work seamlessly on webground up to work seamlessly on web
Website designWebsite design• Reengineer to streamline process and Reengineer to streamline process and
make navigation easier, uniform interfacesmake navigation easier, uniform interfaces• Remove barriers to reaching pageRemove barriers to reaching page• Minimize clicks and new windowsMinimize clicks and new windows• Allow interruption and returnAllow interruption and return
Streamline ProcessStreamline Process
Data Webhouse designData Webhouse design• Build an explicit value chain for reporting and Build an explicit value chain for reporting and
analysis around the application suite using analysis around the application suite using conformed dimensions and factsconformed dimensions and facts
• Drill across functionsDrill across functions• Single user interface for reporting against all parts Single user interface for reporting against all parts
of businessof business• Master report library and FAQsMaster report library and FAQs• Single login and single console access to Single login and single console access to
webhousewebhouse
Reassure UsersReassure Users
Website DesignWebsite Design• Map of processesMap of processes
Data Webhouse designData Webhouse design• Provide status and lineage of current dataProvide status and lineage of current data• Provide status of running reportsProvide status of running reports• Active notificationActive notification• Allow for entry of NA if data not availableAllow for entry of NA if data not available• Time stamped dimensionsTime stamped dimensions• Time stamped reportsTime stamped reports
Allow Problem ResolutionAllow Problem Resolution
Website designWebsite design• Allow backtracking, rollback, play forwardAllow backtracking, rollback, play forward• Keep old transactionsKeep old transactions• Easy error reportingEasy error reporting• Acknowledge, track and follow-up all user inputs, show Acknowledge, track and follow-up all user inputs, show
wait timewait time• Assist searchingAssist searching
Data Webhouse designData Webhouse design• Provide adequate end user supportProvide adequate end user support• Show aggregates in use and availableShow aggregates in use and available• Show system load and percent completedShow system load and percent completed
Build TrustBuild Trust
Clearly state and observe website’s Clearly state and observe website’s policies for using customer’s identitypolicies for using customer’s identity
Website designWebsite design• Do not abuse privacyDo not abuse privacy• Link to privacy statementLink to privacy statement• Use friendly pictures of peopleUse friendly pictures of people• Distinguish between ad content and Distinguish between ad content and
editorial contenteditorial content
Build TrustBuild Trust
Data Webhouse designData Webhouse design• Two-factor securityTwo-factor security
– What you know – passwordWhat you know – password– What you posses – tokenWhat you posses – token
• Track changes in employee and contractor Track changes in employee and contractor statusstatus
• Create and enforce roles for employees, Create and enforce roles for employees, contractors and customerscontractors and customers
• Manage webhouse security directlyManage webhouse security directly
Provide Communication HooksProvide Communication Hooks
Website designWebsite design• Provide useful links to others – internal and Provide useful links to others – internal and
externalexternal• Remove links that invalidate the “back” Remove links that invalidate the “back”
buttonbutton• Use copyable URLsUse copyable URLs• Use URL as medium of distributionUse URL as medium of distribution
Advantages of Web Today 1998 Advantages of Web Today 1998 20002000
Immediate worldwide accessImmediate worldwide access Centralized management - Centralized management -
DecentralizedDecentralized Thin clientThin client Multi-platform (client and server) - Multi-platform (client and server) -
DistributedDistributed Little or no software distribution - Little or no software distribution -
DownloadsDownloads A+A+
Disadvantages of Web TodayDisadvantages of Web Today 1998 1998 20002000
Immature technology - Immature technology - TeenagerTeenager Security - Security - SolutionsSolutions Speed restricted by bandwidth - data Speed restricted by bandwidth - data
and logic must both travel across and logic must both travel across internetinternet
Design limited to least common Design limited to least common denominator or access restricted to denominator or access restricted to specific browserspecific browser
VulnerabilitiesVulnerabilities
Physical assetsPhysical assets Information assetsInformation assets
• thefttheft• modificationmodification
Software assetsSoftware assets Ability to conduct businessAbility to conduct business
Web ArchitectureWeb Architecture•Browser•Applets/ActiveX•Email•Spreadsheet•Word-processingCommunication layer
(network/internet)
Data Warehouse - Relational Database
ApplicationApplication
Analysis/ Graphics Report SQL statistics Writer Query
Multidimensional Summary/AlternativeDatabase Relational Tables
OLAP Server
Thin Client
Internet Server
Database Servers
Business Management through Business Management through InformationInformation
Analysis of historical recordsAnalysis of historical records• order processing, inventory levels, shipments, order processing, inventory levels, shipments,
receivables, customer history, etc.receivables, customer history, etc. Goals include:Goals include:
• Measures of efficiencyMeasures of efficiency• Anticipate changes (planning and forecasting)Anticipate changes (planning and forecasting)• Make adjustmentsMake adjustments• Integration of model and control functionIntegration of model and control function
Rule-Based ManagementRule-Based Management
Create Strategic rulesCreate Strategic rules• IF market demand increasesIF market demand increases
THEN implement marketing campaign A3 THEN implement marketing campaign A3• IF profit margin drops below value XIF profit margin drops below value X
THEN adjust overhead by … THEN adjust overhead by … Must not forget alert rulesMust not forget alert rules
• If unanticipated condition, then notify CFOIf unanticipated condition, then notify CFO Must not be too reactiveMust not be too reactive
• would cause thrashingwould cause thrashing
OLDM Decision ProcessOLDM Decision Process Simultaneous capture of:Simultaneous capture of:
• Decision support informationDecision support information– Surveyed customer on-line in exchange for an additional discountSurveyed customer on-line in exchange for an additional discount
• with business function inputswith business function inputs Immediate computation or estimation of secondary Immediate computation or estimation of secondary
informationinformation• based on planning and forecasting rulesbased on planning and forecasting rules
Decision support information is:Decision support information is:• available on-lineavailable on-line• ready to use “ready to use “as is”as is”
ManagementDefined !
OLDM Decision ProcessOLDM Decision Process Derived data becomes control informationDerived data becomes control information Automation of analysis and decision supportAutomation of analysis and decision support
• immediately available to managementimmediately available to management Problems documented on-lineProblems documented on-line Classes of problem and corrective action Classes of problem and corrective action
codifiedcodified• problem recognitionproblem recognition• decision rulesdecision rules
OLDM Decision ProcessOLDM Decision Process
Requires four types of informationRequires four types of information• Characteristics which identify a class of Characteristics which identify a class of
problemproblem• Corrective action ( management responses Corrective action ( management responses
by problem class)by problem class)• Rules to implement actionsRules to implement actions• Record of resultRecord of result
Potential of OLDMPotential of OLDM
Better managed businessBetter managed business• knowledge asset capture and retentionknowledge asset capture and retention• consistency across enterpriseconsistency across enterprise• flexible, highly responsiveflexible, highly responsive
Close loop with customerClose loop with customer• event and market driven but controlledevent and market driven but controlled
Direct customer interactionDirect customer interaction• via web, telephone, remote connectionvia web, telephone, remote connection
Improved systems capacity planning and system Improved systems capacity planning and system managementmanagement
Re-alignment of business and ITRe-alignment of business and IT