infrastructure for data warehouses. basics of data access data store machine memory buffer memory...

22
Infrastructure for Data Warehouses

Upload: claribel-cox

Post on 21-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Infrastructure for Data Warehouses

Page 2: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Basics Of Data Access

Data Store

Machine Memory

BufferBuffer

MemoryCache

Data Store

Buffer

Bus Structure

Page 3: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Basics Of Data Access: Storage Data on a single disk all share one controller.

Striping data randomly across several disks reduces contention for controller time.

Databases requiring 100% uptime use striping or mirroring to facilitate backup and maintenance. Backups can be written from one copy while processing proceeds with the other one.

Striping, particularly in a RAID environment, permits replacement of failed hardware without bringing down the database.

Page 4: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Basics Of Data Access: Retrieval The speed of processing a given retrieval is

primarily governed by the number of disk accesses required to execute it.

Data is transferred to and from the disk in buffer sized units. On large systems these buffers (blocks) can be set by the code; on PC’s the buffer sizes (sectors) are fixed.

A block may contain several records. If all of the records in a block can be processed before another retrieval is needed then processing is faster.

Page 5: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Basics Of Data Access: Busses A bus transfers data from device to device. In single

systems the bus is internal. In distributed systems the network acts as the bus.

Busses transfer data in units of a word. Normally a word is smaller than a buffer unit so transfer takes several bus cycles. (For networks packets do the same thing as words on a backplane bus.)

Busses can service only one unit on the bus network at a time. Multiple units on the same bus can generate bus contention.

Page 6: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Basics Of Data Access: Cache Cache is high speed data storage location

that stores the most recently used data that is to be transferred between units in a system. Cache speeds up processing by taking advantage of data reuse (looping) typical of most programs, by reducing the number of physical DASD accesses required.

Memory cache (as opposed to CPU cache) is a location in main memory and can be set by the system administrator.

Page 7: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Program Characteristics

Transaction Systems

Access few records at a time.

Require records from random locations.

Update and modify data frequently.

Data Warehouse Systems Access a number of

records at a time. Require records in

order. Update and modify

data infrequently.

Page 8: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

System Tuning

Transaction Systems Small buffers Large cache Fast busses

Data Warehouse Systems Large buffers Small cache Wide busses

Page 9: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Acxiom Overview

Acxiom, creates and delivers Customer and Information

Management Solutions that enable many of the largest, most respected

companies in the world to build great relationships with their

customers. Acxiom achieves this by blending data, technology and services to provide the most

advanced customer information infrastructure available in the

marketplace today.

Page 10: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Data Warehouses

The characteristics of an Acxiom data warehouse generally are...

• Large multi-terabyte databases• Large periodic sequential data loads• Denormalized database schema• Sequential reads/full table scans• Little or no indices• Little or no transaction logging• Robust periodic backup solutions• Performance measured using megabytes/gigabytes per second (MBPS, GBPS)

Page 11: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

IBM

Database

Data Warehouses

The processing platform is generally a large global class server or cluster of servers running UNIX.

The storage sub-system is very fast with wide bandwidth and high levels of redundancy which permits the ability to move large amounts of sequential data in a very short time.

The database is;

A large vertical database that is denormalized with few tables but very long with sorted data and are sometimes several billion rows.

The data is striped across the storage in a manner that prevents physical hot spots and takes advantage of the wide bandwidth.

Page 12: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

IBM

Data Warehouses

Page 13: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Transactional Databases

The characteristics of an Acxiom transactional database generally are...

• Small, usually no larger than a few terabytes• Random and simultaneous inserts, updates, deletes, and queries• Random reads and writes• Normalized database schema• Transaction logging and archiving with incremental and periodic backup solutions• Generally sub-second response required per transaction taking into account concurrency• Performance measured using transactions per second (TPS) and I/O latency

Page 14: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

IBM

Database

Transactional Databases

The processing platform is generally a medium/enterprise class server

The storage sub-system is very fast with low latency and nominal bandwidth and high levels of redundancy which permits the ability to move small amounts of selected data quickly.

The database is;

A normalized database that utilizes lookup tables.

The data is stored randomly within a table but striped across the storage to prevent physical hot spots.

Page 15: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Transactional Databases

IBM

Page 16: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Hybrid Databases

The characteristics of an Acxiom hybrid database generally are...

• Medium sized, usually three to ten terabytes• Random and simultaneous inserts, updates, deletes, and queries• Random and sequential reads and writes• Loosely normalized database schema• Indices used sparingly• Usually a batch maintenance process• Transaction logging and archiving with incremental and periodic backup solutions• Generally sub-second response required per transaction taking into account concurrency• Performance measured using TPS, I/O latency, and MBPS

Page 17: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

IBM

Database

Hybrid Databases

The processing platform is generally a medium sized global class server

The storage sub-system is very fast with wide bandwidth and high levels of redundancy which permits the ability to move large amounts of random and sequential data in a very short time.

The database is;

A large vertical database that is loosely normalized with few tables but very long with sorted data and are sometimes more than a billions rows.

The data is striped across the storage in a manner that prevents physical hot spots and takes advantage of the wide bandwidth.

Page 18: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

IBM

Hybrid Databases

Page 19: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

What’s New/Future Innovations

Grid or scale-out environments...

• Utilize low cost commodity based servers• Low cost/no cost operating systems• Many servers can be working on one problem with the aggregate processing power being more that one large server for less money• Not locked into a single vendor or supplier• When adding a new node, able to use current technology at a lower price• Need to understand and factor in peripheral costs such as network, administration, data center etc.

Page 20: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Parallel Grid

DB

OS

DB

DB

DB DB

IBMserver

pSeries

IBMserver

pSeries

IBMserver

pSeries

IBMserver

pSeries

IBMserver

pSeries

IBMserver

pSeries

DB

DB

OS

Clustered Grid

Page 21: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

• Shared nothing environment, each partition has its own resources allowing unlimited scalability (up to 999 partitions).

• Centralized management of partitioned environment.

• Data is equally distributed across all partitions.

Any partition can receive connections and distribute queries among the other nodes.

.

Distributed Grid Database

Page 22: Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Summary

Understand the process in which the database is to be used and fashion a solution to meet the requirements and customer expectations

Even though a DBA may only be responsible for the database, many factors such as operating system and hardware configuration affect the functionality of the database and thus are a concern to the DBA. A DBA must relate the database to its environment to achieve an optimized solution.

A large multi-terabyte database is not a scary monster, it is the same as dealing with a smaller database, just add a few more zeros.