vldb 2005 31st international conference on very large databases
DESCRIPTION
TRANSCRIPT
VLDB 2005 31st International Conference on Very Large Databases
Raghunath Othayoth Nambiar Meikel PoessHewlett-Packard Company Oracle Corporation
Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant
Systems
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
2
Agenda
• Grid Computing
• Hardware Support
• Software Support
• TPC-H Result
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
3
Grid Computing
1) application and user perspective:−just like the power grid: Have computing
power delivered as requested
2) implementation perspective:−Data virtualization−Resource provisioning−High availability
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
4
From Research to Industry
• Research projects using grid technology:− Seti@Home− World Community Grid
• Traditionally companies used islands of systems to implement corporate data warehouses− Unable to share resources− Too rigid to answer rapidly changing business
needs− Cannot be scaled indefinitely
HP and Oracle are applying the grid concept to industry data warehouses (DW)
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
5
Commercial Grid Market
• IDC calls grid computing the fifth generation of computing
• Commercial grid computing revenue was − 2003: 1 Billion USD− 2008: 12 Billion USD [estimate]
• Forrester Research: − 37% of enterprises are piloting, rolling out or
have implemented some form of grid computing. − 30% of firms are considering grid technology.
(IDC,2004.Www.oracle.com/technology/tech/grid/collateral/idc_oracle10g.pdf)(Forrester, 2004. www.forrester.com/go?docid=34449)
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
6
N-tier v/s Grid Computing
Traditional multi tier datacenter infrastructure – Web servers, application servers and database servers are preconfigured and pre allocated.
Internet
Application Servers (middle tier)
Shared Pool of commodity Servers
Storage Area network (SAN) Network Attached Storage (NAS)
Grid Computing - Infrastructure is dynamically provisioned to applications that have been virtualized.
Resource Virtualization and Provisioning
Application Servers (middle tier)
Application Servers (middle tier)
Database Servers Database Servers
OLTP Database Servers and Direct Attach Storage
DSS Servers Direct Attach Storage
DSS ServersDSS Servers
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
7
Commercial Grid Components
• Commodity hardware (x-86 based servers)• Linux OS - cost effective• SAN – highly scalable• High speed interconnect (Gigabit Ethernet,
InfiniBand)• Management software (manage as individual
servers or manage as one large virtual servers)
• Database layer (ties the resources together, Dynamic resource allocation, parallel processing)
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
8
Commercial Grid benefits
• High scalability
• High flexibility
• Low total cost of ownership
• High availability
• Easy manageability
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
9
Oracle Features for a Data Warehouse Grid
• Dynamic parallel processing
• Data virtualization and dynamic resource provisioning in DW
• Smart inter node parallelism
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
10
Dynamic Parallel Processing
• Queries are automatically parallelized to maximize resource utilization
• Degree of Parallelism (DOP) is adjusted according to resource availability and computing demands at parse time
• DOP is automatically adjusted when:− Number of concurrent users change− Nodes are taken down for maintenance− Nodes are added due to increased computing
demand (scale-out)− Nodes are assigned to different application
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
11
Data Virtualization and Dynamic Resource Provisioning in DW
• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses
1 2 3 4 5 6 7 8
Nodes
Disk Subsystem
Interconnect
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
12
Data Virtualization and Dynamic Resource Provisioning in DW
• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses
1 2 3 4 5 6 7 8
OLAP Reports ETL
Nodes
Disk Subsystem
WorkloadType
Interconnect
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
13
Data Virtualization and Dynamic Resource Provisioning in DW
• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses
1 2 3 4 5 6 7 8
OLAP Reports ETLDuring peak working hoursNodes
Disk Subsystem
WorkloadType
Interconnect
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
14
Data Virtualization and Dynamic Resource Provisioning in DW
• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses
1 2 3 4 5 6 7 8
OLAP Reports ETLDuring the night
Nodes
Disk Subsystem
WorkloadType
Interconnect
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
15
Data Virtualization and Dynamic Resource Provisioning in DW
• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses
1 2 3 4 5 6 7 8
OLAP Reports ETLDuring short intervals when the DW is synchronized with the OLTP system
Nodes
Disk Subsystem
WorkloadType
Interconnect
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
16
Data Virtualization and Dynamic Resource Provisioning in DW
• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses
OLAP Reports ETLWithout response time requirements all types of workload can run on all nodes
Nodes
Disk Subsystem
1 2 3 4 5 6 7 8
WorkloadType
Interconnect
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
17
Data Virtualization and Dynamic Resource Provisioning in DW
• This concept can be extended to different applications
1 2 3 4 5 6 7 8
OLTP DW DM
Nodes
Disk Subsystem
WorkloadType
Interconnect
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
18
Data Virtualization and Dynamic Resource Provisioning in DW
• This concept can be extended to different applications
1 2 3 4 5 6 7 8
OLTP DW DM
Nodes
Disk Subsystem
WorkloadType
1 2 3 4 5 6 7 8
Interconnect
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
19
Smart Inter Node Parallelism
• Optimizer avoids inter node parallelism when possible reduced interconnect traffic faster execution time
1) node locality− If possible operations are executed on one node− When the DOP of an operation can be satisfied with
resources of one server it executes locally
2) full partition wise join− If two tables are equipartitioned on their join key, the join
can be divided into smaller joins between partitions
3) partial partition wise join− If only one table is partitioned on the join key, the other
table is dynamically repartitioned on the join key to break the large join into smaller joins.
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
20
TPC-H Benchmark
• The industry standard benchmark for data warehouse applications
• Stresses grid based data warehouses:− Complex queries
• Sequential scans of large amounts of data• Aggregations of large amounts of data• Multi-table joins• Extensive sorting of very large sets of data
− Single-user test− Multi-user test− Parallel insert operations− Parallel delete operations
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
21
Benchmarked Configuration
12 x hp SAN Switch 2/16
48 x hp MSA1000
2 x hp ProCurveSwitch 4148gl
12 x hp ProLiant DL585-4x AMD 848 Opteron™ 2.2GHz/1MB8GB 2 x On-board NICs6 x hp fca 2214 DC1 x InfiniCon Systems InfiniServ 7000 HCA
hp ProLiant DL585 Cluster 48P
Storage Area Network
:InfiniCon Systems InfiIO3016
:
:
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
22
Current results
1,000 GB Results
Rank Company System QphH Price/ QphH System
Availability Database
Operating System
Date Submitted
Cluster
1
HP Integrity Superdome
Enterprise Server 68,100 59.00 US $ 01/18/06
Oracle Database 10g R2 Enterprise
Edt w/Partitioning
HP UX 11.i V2 64 bit
08/08/05 N
2
IBM eServer xSeries 346
53,451 32.80 US $ 02/14/05 IBM DB2 UDB
8.2
SUSE LINUX
Enterprise Server 9
02/14/05 Y
3
HP ProLiant DL585 Cluster 48P
35,141 59.93 US $ 10/21/04 Oracle 10g RAC with
Partitioning
Red Hat Enterprise Linux AS 3
10/22/04 Y
4 PRIMEPOWER 2500 34,492 155.99 Euros 03/08/04
Oracle Database 10g
Enterprise Edition
Sun Solaris 9
09/08/03 N
***
PRIMEPOWER 2500 34,492 140.96 US $ 03/08/04
Oracle Database 10g
Enterprise Edition
Sun Solaris 9
11/13/03 N
5
IBM eServer p5 570 with DB2 UDB
26,156 53.43 US $ 12/15/04 IBM DB2 UDB
8.2 IBM AIX 5L V5.3
09/15/04 Y
6
NEC Express5800/1320Xe
(32SMP) 22,967 68.51 US $ 12/07/05
Microsoft SQL Server 2005 Enterprise
Edition 64bit
Microsoft Windows Server 2003
Datacenter Edition 64-
bit
07/19/05 N
7
Unisys ES7000 Orion 440 Enterprise
Server 21,505 41.92 US $ 12/07/05
Microsoft SQL Server 2005 Enterprise
Edition 64bit
Microsoft Windows Server 2003
Datacenter Edition 64-
bit
06/27/05 N
8
NEC Express5800/1320Xe
(32SMP) 20,231 76.06 US $ 12/07/05
Microsoft SQL Server 2005 Enterprise
Edition 64bit
Microsoft Windows Server 2003
Datacenter Edition 64-
bit
06/07/05 N
9
IBM eServer p655 with DB2 UDB
20,221 69.41 US $ 06/08/04 IBM DB2 UDB
8.1 IBM AIX 5L V5.2
12/08/03 Y
10
NovaScale 5160 15,069 44.32 US $ 12/20/05
Oracle Database 10g
release2 Enterprise
Edt
Microsoft Windows Server 2003
Datacenter Edition
06/20/05 N
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
23
Result Analysis
• Leadership performance− Query performance of 35,141 QphH @ 1000GB− Price-to-performance ratio of
$60/QphH @ 1000GB Database grid of ProLiant systems with multiple
Opteron–-x86 processors deliver performance comparable to large SMP systems
The Linux operating system delivers the throughput and processing demands necessary to achieve the benchmark result
Oracle’s 10g + RAC database delivers consistent, high performance query execution in large grid environments
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
24
Future Hardware for Grid – HP BladeSystems
April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway
25
Conclusion
• Grid is ready for prime time• In grid computing resources are provisioned
on demand and virtualized for applications to meet today’s challenging business needs
• Commodity x-86 based servers and blade servers offer reduced total cost of ownership
• Overcomes the natural limitations of SMP systems such as number of processors, memory and disk arrays