Download - Architecture and Infrastructure
![Page 1: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/1.jpg)
Architecture and Infrastructure
Module 2G.Anuradha
![Page 2: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/2.jpg)
![Page 3: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/3.jpg)
What is architecture?
• The structure that brings all the components of a data warehouse together is known as the architecture.
• Many factors affect the architecture of a DW– Integrated data– Data preparation and storing– Data delivery– Technology
• Comprehensive blueprint
![Page 4: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/4.jpg)
Architecture in 3 major areas
• Data acquisition• Data storage• Information delivery
![Page 5: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/5.jpg)
Distinguishing characteristics of architecture
• Different Objectives and Scope– For providing strategic information DW should have elaborate
architecture– Scope depends on the sources used in the acquisition region
• Data Content– Dealing with historical, read only data
• Complex Analysis and Quick Response– Drill down, roll up, slice, dice, what if scenarios
• Flexible and Dynamic– Design should be dynamic after designing as well
• Metadata-driven– Every movement is trapped in it.
![Page 6: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/6.jpg)
Test your fundas
![Page 7: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/7.jpg)
ACROSS1. Business dimension(5)6. Smaller than DW(8)7. Combining data from different operational systems(10)8. Initial loading(7)
DOWN2. Remove useful information from operational data(10)3. Monitoring the entire function (10)4. Historical(8)5. Data about entire warehouse(8)
![Page 8: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/8.jpg)
Solution
![Page 9: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/9.jpg)
Architecture supporting the flow of data
Data Source(internal & External)
Data StagingTransformation
CleansingIntegration of Data
Data StorageLoading of data from Staging
AreaStoring for Information
Delivery
MetadataStorage mechanism for data about data
Information DeliveryDependent data marts, MDDBs, Query and
reporting facilities
![Page 10: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/10.jpg)
Management and control module
• Umbrella component having two important functions– Monitor all ongoing operations– Problem recovery
![Page 11: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/11.jpg)
![Page 12: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/12.jpg)
List of services and functions-Data Extraction
• Select data sources and determine the types of filters to be applied to individual sources
• Generate automatic extract files from operational systems using replication and other techniques
• Create intermediary files to store selected data to be merged later• Transport extracted files from multiple platforms• Provide automated job control services for creating extract files• Reformat input from outside sources• Reformat input from departmental data files, databases, and
spreadsheets• Generate common application code for data extraction• Resolve inconsistencies for common data elements from multiple
sources
![Page 13: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/13.jpg)
List of services and functions-Data Transformation
• Map input data to data for data warehouse repository• Clean data, deduplicate, and merge/purge• Denormalize extracted data structures as required by
the dimensional model of the data warehouse• Convert data types• Calculate and derive attribute values• Check for referential integrity• Aggregate data as needed• Resolve missing values• Consolidate and integrate data
![Page 14: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/14.jpg)
List of functions and services-Data staging
• Provide backup and recovery for staging area repositories• Sort and merge files• Create files as input to make changes to dimension tables• If data staging storage is a relational database, create and
populate database• Preserve audit trail to relate each data item in the data
warehouse to input source• Resolve and create primary and foreign keys for load tables• Consolidate datasets and create flat files for loading through
DBMS utilities• If staging area storage is a relational database, extract load files
![Page 15: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/15.jpg)
Data Storage
• loading the data from the staging area into the data warehouse repository
• before loading data into the data ware the metadata repository gets populated
• For top-bottom approach there could be movements of data from the enterprise-wide data warehouse repository to the repositories of the dependent data marts
• For bottom-up approach data movements stop with the appropriate conformed data marts
![Page 16: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/16.jpg)
![Page 17: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/17.jpg)
Information Delivery
• Information access in a data warehouse is through online queries and interactive analysis sessions
• data warehouse will also be producing regular and ad hoc reports.
• data warehouse feeds data to proprietary multidimensional databases (MDDBs) where summarized data is kept as multidimensional cubes of information
![Page 18: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/18.jpg)
![Page 19: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/19.jpg)
Data stores for information delivery
![Page 20: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/20.jpg)
Function and services• Provide security to control information access and monitor user access• Allow users to browse data warehouse content by hiding internal
complexities• Automatically reformat queries for optimal execution, from aggregate
tables as well• Provide self-service report generation for users, consisting of a variety of
flexible options to create, schedule, and run reports• Store result sets of queries and reports for future use• Provide multiple levels of data granularity• Provide event triggers to monitor data loading• Make provision for the users to perform complex analysis through OLAP• Enable data feeds to downstream, specialized decisions support systems
such as EIS and data mining
![Page 21: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/21.jpg)
Summing up……
• Architecture is the structure that brings all the components together.
• The architectural components support the functioning of the data warehouse in the three major areas of data acquisition, data storage, and information delivery.
![Page 22: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/22.jpg)
Infrastructure of DW
G.Anuradha
![Page 23: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/23.jpg)
InfrastructureElements that enable the architecture to be
implemented.Operational – help to keep the DW going
People Procedures Training Management software
Physical Hardware components Operating system Network, network software
![Page 24: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/24.jpg)
Features of Hardware & OSHardware
ScalabilityVendor supportVendor stability
OSScalabilitySecurityReliabilityAvailabilityPreemptive multitaskingMemory protection
![Page 25: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/25.jpg)
Possible optionsMainframes
Old hardwareDesigned for OLTPExpensiveNot easily scalable
Open System ServersUNIX servers are most optedRobustAdapted for parallel processing
NT ServersMedium-sized data warehousesLimited parallel processingCost effective for small or medium DW
![Page 26: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/26.jpg)
Platform OptionsA computing platform is the set hardware
components, operating system, network & network software.
Both Online Transaction Processing and Decision Support Systems need a computing platform.
![Page 27: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/27.jpg)
Single Platform OptionAll functions from back-end data extraction to
front-end query processing is performed on one platform.Data flows smoothly, no conversions requiredNo middleware requiredLimitationsLegacy platform stretched to capacityNon-availability of toolsMultiple legacy platformsCompany’s migration policy
![Page 28: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/28.jpg)
Hybrid Platform OptionEliminate s the drawbacks of single platform
optionData extraction: Each source is extracted on its
own computing platformInitial reformatting & merging: The extracted
file from each source is reformatted & merged, on their respective platforms
Preliminary data cleansing: Verify extracted data for missing values & data types.
Transformation & Consolidation: Performed on the platform where the staging area resides.
Validation & Final Quality CheckCreation of Load Images
![Page 29: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/29.jpg)
Options for staging areaLegacy platforms – when all data sources are
on the same platform, we can create a DW also on the same
Data storage platform – the warehouse DBMS runs here. This can be used for staging also.
Separate optimal platform – a separate platform for staging data
![Page 30: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/30.jpg)
Server HardwareServer hardware is most important
ScalabilityQuery processing
![Page 31: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/31.jpg)
Data movement options
![Page 32: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/32.jpg)
Client/Server architecture for DW
![Page 33: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/33.jpg)
Considerations on client workstationsDepends on type of users
casual user-Web browser and HTML reportsAnalyst-more powerful workstation machine
Practically feasible solution is a minimum configuration on an appropriate platform that would support a standard set of information delivery tools in DW
![Page 34: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/34.jpg)
Platform options as DW matures
![Page 35: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/35.jpg)
Parallel processingSymmetric multiprocessingClustersMassively parallel processingCache-coherent Nonuniform Memory
Architecture
![Page 36: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/36.jpg)
Symmetric Multiprocessing
![Page 37: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/37.jpg)
Clusters
![Page 38: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/38.jpg)
Massively Parallel Processing
![Page 39: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/39.jpg)
NUMA or ccNUMA
![Page 40: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/40.jpg)
Database Software
Many operations can be parallelizedmass loading of data, full table scans, queries
with exclusion conditions, queries with grouping, selection with distinct values, aggregation, sorting, creation of tables using subqueries, creating and rebuilding indexes, inserting rows into a table from other tables, enabling constraints, star transformation
![Page 41: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/41.jpg)
Types of parallelization
![Page 42: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/42.jpg)
Software Tools
![Page 43: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/43.jpg)
Summing upInfrastructure acts as the foundation
supporting the data warehouse architectureData warehouse infrastructure consists of
operational infrastructure and physical infrastructure.
Hardware and operating systems make up the computing environment for the DW.
Several options exist for the computing platforms needed to implement the various architectural components.
![Page 44: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/44.jpg)
Summing upSelecting the server hardware is a key
decision. Invariably, the choice is one of the four parallel server architectures.
Current database software products are able to perform interquery and intraquery parallelization.
Software tools are used in the data warehouse for data modeling, data extraction, data transformation, data loading, data quality assurance, queries and reports, and online analytical processing (OLAP).
Tools are also used as middleware, alert systems,
and for data warehouse administration.
![Page 45: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/45.jpg)
METADATAData dictionary or data catalogContains data about the data in the DW like
data structuresfiles and addressesindexes
Types of MetadataOperationalExtraction & TransformationalEnd-User
![Page 46: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/46.jpg)
Need for a MetadataFor using the DWFor building the DWFor administering the DWAutomation of the DW
![Page 47: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/47.jpg)
Metadata by functional areasEvery DW process occurs in one of these 3
areasData acquisitionData storageInformation delivery
![Page 48: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/48.jpg)
Data acquisition - metadata
![Page 49: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/49.jpg)
Information Delivery – metadata
![Page 50: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/50.jpg)
Types of MetadataBusiness metadata
Portrays DW from the end user perspectiveShows business names, not actual file namesLess structured as compared to technical
metadataUsed by business analysts and other end users.
Technical metadataShows the actual structure and content of the
DWActs as a guide to build, maintain and
administer the DWUsed the the data warehouse administrator,
and other IT staff working on the DW.
![Page 51: Architecture and Infrastructure](https://reader035.vdocument.in/reader035/viewer/2022062304/56813e2e550346895da80ea5/html5/thumbnails/51.jpg)
How to provide metadataMetadata requirementsSourcesChallengesRepositoryIntegration and standardsImplementation options