<Insert Picture Here>
Oracle Exadata Database Machine
Introduction and hardware
Disclaimer
This PTS workshop/briefing is intended to provide training and guidance in the latest
Oracle products specifically for our Partners.
Whilst it may assist Partners in their goals for Oracle Specialization, PTS courses are in no way related to the Specialization program and
do not follow the curriculum or syllabus set out by that program.
Copyright © 2009 Oracle Corporation and/or its affiliates – 4 –
Exadata Database Machine Hardware
• Complete, Pre-configured, Tested for
Extreme Performance • Database Servers
• Exadata Storage Servers
• InfiniBand Switches
• Ethernet Switch
• Pre-cabled
• Power Distribution Units (PDUs)
• Ready to Deploy • Plug in power
• Connect to Network
• Ready to Run Database
Oracle Database Machine Key Components
Database Server Grid
8 clustered RAC nodes (X2-2) or
2 clustered RAC nodes (X2-8)
Exadata Storage Server Grid
14 Exadata Storage servers
3 InfiniBand Switches High speed networking up to 40Gb/s
Sun Rack II 1242 19” rack
• 42U height
• Industry-standard 19-inch adjustable
square-hole RETMA rails
• Selection of zero-RU PDU options
• Various input and output connectors
• Providing the highest density in the lowest profile
• 80 percent door perforations
• Enabling a high level of airflow through the cabinet
X2-2 Database Server (Sun Fire X4170 M2)
Processors 2 Six-Core Intel® Xeon® X5675 Processors (3.06 GHz)
Memory 96GB (12 x 8GB), can be upgraded to 144GB ( 18 x 8GB)
Local Disks 4 x 300GB 10K RPM SAS Disks
Disk Controller Disk Controller HBA with 512MB Battery Backed Cache
Network 2 (Two) x InfiniBand 4X QDR (40Gb/s) Ports (1 Dual-port PCIe 2.0
HCA)
4 (Four) x 1GbE Ethernet Ports
2 (Two) x 10GbE Ethernet SFP+ Ports (1 Dual-port 10GbE PCIe
2.0 network card based on the Intel 82599 10GbE Controller
technology)
Remote
Management
1 Ethernet port (ILOM)
Power supplies 2 Redundant Hot-Swappable power supplies
X2-8 Database Server (Sun Fire X4800)
Processors 8 x Eight-Core Intel® Xeon® X7560 Processors (2.26 GHz)
Memory 1 TB (128 x 8GB)
Local Disks 8 x 300GB 10K RPM SAS Disks
Disk Controller Disk Controller HBA with 512MB Battery Backed Cache
Network 8 (Eight) x InfiniBand 4X QDR (40Gb/s) Ports (4 Dual-port PCE 2.0
Express Modules)
Two Network Express Modules (NEM), providing a total of
• 8 (Eight) x 1GbE Ethernet Ports
• 8 (Eight) x 10 GbE Ethernet SFP+ Ports (via 4 Fabric Express
Modules (FEM) based Intel 82599 10GbE Controller technology)
Remote
Management
1 Ethernet port (ILOM)
Power supplies 4 Redundant Hot-Swappable power supplies
Database Server Operating System Choices
• Two OS choices on the Database servers
• Oracle Linux
• Solaris 11 Express (x86)
• Requires at least Exadata Storage version 11.2.2.3.2
• Mandatory migration to Solaris 11 (6 months after release)
• Choose preferred DB Server OS at installation time
• Procedures in place to change afterwards
• Mix of Linux and Solaris on DB servers allowed
• Exadata Storage Servers will continue to be Oracle Linux
Exadata Storage Server X2-2 (Sun Fire X4270 M2)
Processors 2 Six-Core Intel® Xeon® L5640 Processors (2.26 GHz)
Memory 24 GB (6 x 4GB)
Disks 12 x 600 GB 15K RPM High Performance SAS
OR
12 x 2 TB 7.2K RPM High Capacity SAS
Flash 4 x 96 GB Sun Flash Accelerator F20 PCIe Cards
Disk Controller Disk Controller HBA with 512MB Battery Backed Cache
Network 2 (Two) InfiniBand 4X QDR (40Gb/s) Ports (1 Dual-port PCIe
2.0 HCA)
4 Embedded Gigabit Ethernet Ports
Remote
Management
1 Ethernet port (ILOM)
Power Supplies 2 Redundant Hot-Swappable power supplies
The Inifiniband network
• InfiniBand network
• Connects the Exadata cells to the Database nodes
• Connects Database nodes for RAC interconnect
• Switches provided by DB Machine
• Sun Datacenter 36-port Managed QDR InfiniBand switches
• 2 switches used for connecting components inside DBM
• 1 switch available for connection with other DBM and HA
• Delivered with full and half Rack
InfiniBand Network High Bandwidth, Low Latency
• Sun Datacenter InfiniBand Switch 36
• Fully redundant non-blocking I/O paths from servers
to storage
• 2.88 Tb/sec bi-sectional bandwidth per switch
• 40 Gb/sec QDR, Dual port QSFP per server
All Racks X2-2 - Two “Leaf” Switches
• InfiniBand network with Redundancy
• Servers connect to the two leaf switches
• Active & Passive ports balanced across switches
• Full Bandwidth even if switch fails
• Connections pre-wired at factory
DB Servers (1,2,3,4)
Exadata Servers (1,2,..,7)
L1 L2
11
DB Servers (5,6,7,8)
Exadata Servers (8,9,..14)
DB Servers (5,6,7,8)
Exadata Servers (8,9,..,14)
DB Servers (1,2,3,4)
Exadata Servers (1,2,..,7)
11
11
11
Port 1 Port 1
Port 2 Port 2
Leaf switch Leaf switch
Spine and Leaf InfiniBand Switch On Half and Full Racks
• Use 3rd switch (S) as “spine” switch
• Connect each leaf switch to spine switch (1 links wide)
• Interconnect “leaf” switches with each other (7 links wide)
• Enough bandwidth even if switch failure
• Pre-wired at factory
L1 L2
S
7
1 1
DB and Exadata
Servers Ports Spine switch
Leaf switch Leaf switch
Two Full Rack Case
Fat Tree Topology
• “Leaf” switches – L1, L2 (2 per rack)
• No change in connection to DB and Storage servers
• “Spine” switches – S (1 per rack)
• Connect each leaf switch to every spine switch (4 links wide)
L1 L2
L1 L2
S S
Rack 1 Rack 2
Up to 8 Full or Half Racks
• Every „Spine‟ Switch connects to every “Leaf‟ switch
• >8 rack requires (larger) external switches
• Up to 3 racks all cables are included
Rack N Rack 1
…
…
L1 L2
S
Rack 2
L1 L2
S
L1 L2
S
InfiniBand – The basic idea
• High speed, low latency (< 6 microsec) data transport
• Fast access to storage
• Extremely fast for RAC interconnect
• Direct access to network from applications
• No OS layer in between, no CPU usage
• Direct buffer-to-buffer communication, no CPU usage
• Bidirectional port (QDR = 40Gb both ways)
• Switched fabric topology
• Several devices communicate at once with full speed
• Several protocols can be used over the same fabric
• iDB, ZDP, IPoIB, iSCSIoIB, NFSoIB etc etc
http://members.infinibandta.org/kwspub/Intro_to_IB_for_End_Users.pdf
Exadata Protocols used
• iDB (Intelligent DataBase)
• Exadata to Node
• RDSv3 + Query communication
• Includes RDMA (Remote Direct Memory Access)
• Move memory without OS/CPU involvement
• No use of a buffer or cache
• ZDP (Zero-copy Zero-loss Datagram Protocol)
• RDS v3
• Node-to-Node traffic
• Does not use RDMA
• Internet Protocol over InfiniBand (IPoIB)
• Looks like normal Ethernet to host software (tcp/ip, udp, http, ssh,…)
External Connectivity
On the InfiniBand network
• Connect external devices to Exadata InfiniBand network
• Backup and recovery, Client / Application access
• High speed loading
• Distances are limited
• Max 5 meters for Copper, 100 meters for Fiber Optic
• Limited available ports available
• Maximum 6 ports per leaf switch available on Full Rack
• More available ports on ½ and ¼ but then limited on Exa expansion
• For HA, connect external system to both leaf switches
• Connecting external (InfiniBand) switches
• Only SUN or Exadata supplied InfiniBand switches
Management Network
• Management Network switch
• Cisco 4948 1U network switch
• Connects to all software management network ports
• Full network access to the DB and Storage nodes
• Connects to most hardware management ports
• InfiniBand switches, ILOM ports on all servers
• Exception of PDU‟s and KVM
Public or Application network
Isn’t this the most important one ?!?
• Each Database node has
• 3 available 1Gb network ports
• 2 available 10Gb network ports
• NO switches are supplied for Public network
• Customer / datacenter has to supply network ports / cables
• Customer can decide how to distribute load on their network
• Design decision on ports to use
• 1GbE or 10GbE
• Single connections or bonded connections
X2-2 Full Rack Ethernet Network Addresses
(MGMT) Ethernet Subnet 1 IP addresses 51
ILOM for Database Servers 8
ILOM for Exadata Cells 14
Eth0 for Database Servers 8
Eth0 for Exadata Cells 14
Mgmt port for IB switches 3
IP address for KVM 1
IP address for Ethernet Switch 1
IP address for PDUs 2
(PUB) Ethernet Subnet 2 IP addresses 19
Eth1 for Database Servers 8
VIPs for Database Servers 8
SCAN addresses (per cluster) 3
Total 70
Minimum Number of Ethernet
Drops from Data Center
12
X2-2 Half Rack Ethernet Network Addresses
(MGMT) Ethernet Subnet 1 IP addresses 29
ILOM for Database Servers 4
ILOM for Exadata Cells 7
Eth0 for Database Servers 4
Eth0 for Exadata Cells 7
Mgmt port for IB switches 3
IP address for KVM 1
IP address for Ethernet Switch 1
IP address for PDUs 2
(PUB) Ethernet Subnet 2 IP addresses 11
Eth1 for Database Servers 4
VIPs for Database Servers 4
SCAN addresses (per cluster) 3
Total 40
Minimum Number of Ethernet
Drops from Data Center
8
X2-2 Quarter Rack Ethernet Network Addresses
(MGMT) Ethernet Subnet 1 IP addresses 16
ILOM for Database Servers 2
ILOM for Exadata Cells 3
Eth0 for Database Servers 2
Eth0 for Exadata Cells 3
Mgmt port for IB switches 2
IP address for KVM 1
IP address for Ethernet Switch 1
IP address for PDUs 2
(PUB) Ethernet Subnet 2 IP addresses 7
Eth1 for Database Servers 2
VIPs for Database Servers 2
SCAN addresses (per cluster) 3
Total 23
Minimum Number of Ethernet
Drops from Data Center
6
X2-8 Full Rack Ethernet Network Addresses
(MGMT) Ethernet Subnet 1 IP addresses 38
ILOM for Database Servers 2
ILOM for Exadata Cells 14
Eth0 for Database Servers 2
Eth0 for Exadata Cells 14
Mgmt port for IB switches 3
IP address for Ethernet Switch 1
IP address for PDUs 2
(PUB) Ethernet Subnet 2 IP addresses 7
Eth1 for Database Servers 2
VIPs for Database Servers 2
SCAN addresses (per cluster) 3
Total 45
Minimum Number of Ethernet
Drops from Data Center
5
Copyright © 2010 Oracle Corporation and/or its affiliates – 26 –
X2-2 Full Rack Environmental Power Max – 14.0 kW (14.3 kVA)
Typical – 9.8 kW (10.0 kVA)
Cooling Max – 47,800 BTU/hr (50,400 kJ/hr)
Typical – 33,400 BTU/hr (35,300 kJ/hr)
Airflow: front-to-back
(subject to actual data center
environment)
Max ~2,200 CFM
Typical ~1,560 CFM
Physical Dimensions 78.66” (H) x 23.62”(W) x 47.24”(D)
1998mm (H) x 600mm (W) x 1200mm (D)
Weight 2131 lbs (966.6 kg)
Operating Temperature Operating temperature/humidity: 5 ºC to 32 ºC (41 ºF to
89.6 ºF), 10% to 90% relative humidity, non-condensing
Altitude Operating: Up to 3,048 m, max. ambient
temperature is de-rated by 1° C per 300 m above 900 m
IP addresses 22 (InfiniBand Network)
70 (Ethernet Network - assuming single cluster)
Network Drops Minimum 12 network drops
External Connectivity Up to 24 x 1GbE Ethernet Ports
Up to 16 x 10GbE Ethernet SFP+ Ports
At least 12 InfiniBand Ports
Copyright © 2010 Oracle Corporation and/or its affiliates – 29 –
X2-8 Full Rack Environmental Power Max – 17.0 kW (17.4 kVA)
Typical – 11.9 kW (12.2 kVA)
Cooling Max – 58,050 BTU/hr (61,200 kJ/hr)
Typical – 40,630 BTU/hr (42,840 kJ/hr)
Airflow: front-to-back
(subject to actual data center
environment)
Max ~2,690 CFM
Typical ~1,880 CFM
Physical Dimensions 78.66” (H) x 23.62”(W) x 47.24”(D)
1998mm (H) x 600mm (W) x 1200mm (D)
Weight 2080 lbs (943.5 kg)
Operating Temperature Operating temperature/humidity: 5 ºC to 32 ºC (41 ºF to
89.6 ºF), 10% to 90% relative humidity, non-condensing
Altitude Operating: Up to 3,048 m, max. ambient
temperature is de-rated by 1° C per 300 m above 900 m
IP addresses 22 (InfiniBand Network)
45 (Ethernet Network - assuming single cluster)
Network Drops Minimum 5 network drops
External Connectivity Up to 16 x 1GbE Ethernet Ports
Up to 16 x 10GbE Ethernet SFP+ Ports
At least 12 InfiniBand Ports
Copyright © 2010, Oracle Corporation and/or its affiliates – 30 –
Exadata Product Capacity (Uncompressed)
X2-8
Full Rack
X2-2
Full Rack
X2-2
Half Rack
X2-2
Quarter Rack
Raw Disk1 High Perf Disk 100 TB 100 TB 50 TB 21 TB
High Cap Disk 336 TB 336 TB 168 TB 72 TB
Raw Flash1 5.3 TB 5.3 TB 2.6 TB 1.1 TB
Usable Capacity with ASM
normal redundancy2
High Perf Disk 45 TB 45 TB 22.5 TB 9.25 TB
High Cap Disk 150 TB 150TB 75 TB 31.5 TB
Usable Capacity with ASM
high redundancy3
High Perf Disk 30 TB 30 TB 15 TB 6.25 TB
High Cap Disk 100 TB 100TB 50 TB 21.5 TB
1 - Raw capacity calculated using standard disk drive raw space terminology of 1 GB = 1000 x 1000 x 1000 bytes and 1 TB = 1000 x 1000 x 1000 x 1000 bytes.
2 - Actual space available for a database after mirroring (ASM normal redundancy) and leaving one empty disk to handle disk failures. Capacity calculated using normal space terminology of 1 TB = 1024 * 1024 * 1024 * 1024 bytes.
3 - Actual space available for the database computed after triple mirroring (ASM high redundancy). Capacity calculated using normal space terminology of 1 TB = 1024 * 1024 * 1024 * 1024 bytes.
Copyright © 2010, Oracle Corporation and/or its affiliates – 31 –
Exadata Product Performance
X2-8
Full Rack
X2-2
Full Rack
X2-2
Half Rack
X2-2
Quarter
Rack
Raw Disk Data
Bandwidth1,3
High Perf Disk 25 GB/s 25 GB/s 12.5 GB/s 5.4 GB/s
High Cap Disk 14 GB/s 14 GB/s 7 GB/s 3 GB/s
Raw Flash Data
Bandwidth1,3
High Perf Disk 75 GB/s 75 GB/s 37.5 GB/s 16 GB/s
High Cap Disk 64 GB/s 64 GB/s 32 GB/s 13.5 GB/s
Disk IOPS2,3 High Perf Disk 50,000 50,000 25,000 10,800
High Cap Disk 25,000 25,000 12,500 5,400
Flash IOPS2,3 1,500,000 1,500,000 750,000 375,000
Data Load Rate4 12 TB/hr 12 TB/hr 6 TB/hr 3 TB/hr
1 - Bandwidth is peak physical disk scan bandwidth achieved running SQL, assuming no compression.
2 - IOPs – Based on peak IO requests of size 8K running SQL. Note that other products quote IOPs based on 2K, 4K or smaller IO sizes that are not relevant for databases.
3 - Actual performance will vary by application.
4 - Load rates are typically limited by CPU, not IO. Rates vary based on load method, indexes, data types, compression, and partitioning
Copyright © 2009 Oracle Corporation and/or its affiliates – 32 –
<Insert Picture Here>
Expanding Exadata Database Machines
To keep in mind
• Database servers and Storage Servers are balanced
• InfiniBand throughput of 1-2 cells can be used by 1 DB node
• Reason behind 8 DB nodes and 14 storage cells
• Adding cells can increase throughput
• If no I/O bottleneck between cells and DB nodes
• If CPU on the DB nodes is not a bottleneck
• Adding DB nodes can increase throughput
• More threads, more memory, more bandwidth
Increasing memory
• X2-8 has 2TB of memory
• No slots left to use
• X2-2 has 96GB of memory
• Only 12 banks out of 18 filled with 8GB modules
• When 12 banks filled, memory speed is 1333Mhz
• X2-2 Memory Expansion Pack
• Add 6x8GB memory modules per DB node
• Memory speed goes down to 800Mhz
• Minor performance reduction due to lower memory bandwidth
Changing Management Switch
• Current Cisco switch can be exchanged
• New switch at customer expense
• Adhere to datacenter standards
• Introduce new/different uplinks to customer mgmt network
• Management cables do not need mgmt switch
• Can be plugged into customer network directly
• Customer supplies long(er) cables
• Exactly like public network
Adding power to your DBM
Increase storage and compute power
• Add new Database Machines
• Can mix X2-2 full, half and quarter racks
• Cannot mix X2-8 and X2-2
• Buy Expansion / upgrade kits for X2-2
• Quarter Rack -> Half Rack
• Half Rack -> Full Rack
• Not all components need to be licensed
• Expansion / upgrade always possible
• Within support timeline (minimum 5 years after EOL)
• Current V2 can be expanded with X2-2 upgrade kit
Adding power to your DBM
Only increase storage
• Exadata storage servers are sold separate
• Only supported if connected to DB nodes in DB Machine
• Must be installed in certified Rack
• Example: Same Rack as Exadata
• Oracle Sun Rack II 1242
• Disks may be different from disks in DB Machine
• Most probably need additional switches
• Setup and installation is done at customer site
• Big installation overhead like with non-Exadata systems
Exadata Storage Expansion Racks
Only increase storage
• Designed Exadata Storage Racks for
• On-disk backups
• Historical (ILM) data
• Image data
• Unstructured data
• Designed like Exadata Database Machines
• Contain 4, 9 or 18 Exadata storage cells (Q/H/F)
• Only available with High Capacity (2TB) drives
• Includes required switches for connections to DBM
• Add up to 194TB (mirrored) usable capacity
• Add up to 6.75TB of Smart Flash Cache
• Add up to 216 CPU cores for processing
Sun Flash Accelerator F20 PCIe Card
• Using Flash card instead of SSD
• No problems with disk controller !
• 96GB Storage Capacity / card
• 4 cards in each Exadata cell
• 4 x 24GB Flash modules / card
• 6GB reserved for failures / card
• Optimized for Database caching
• Measured end-to-end performance
• 3.6GB/sec/cell
• 75,000 IOPs/cell
Copyright © 2009, Oracle Corporation and/or its affiliates – 39 –
<Insert Picture Here>
Setting up the Storage cells
Exadata Flash Storage Layout
Cell
Disk
Sys Area
• Physical cards map to a Cell Disks
• Cell Disks are either
• Added to FlashCache
• Partitioned as Griddisks to be visible in ASM for normal storage
Flash
card
Grid Disk n
Grid Disk 1
…
ASM disk
ASM disk
Flash
Cache
Use Cases for Flash-Based Storage
• Considerations for using the Flash as permanent
storage
• Hot data or heavy data scattered reads and writes
• Heavy scattered data reads and writes can consume disk
based storage bandwidth
• May be used effectively with partitioning
• Most current data partition is in Flash as tablespace, older
partitions on disks
• As current data ages, you can move them to disk
• Only use 50% of the storage because of HA need
• Need to create “normal redundancy” diskgroup on Flash as
well
Copyright © 2009 Oracle Corporation and/or its affiliates – 42 –
Harddisks
• Every Exadata cell has 12 disks
• Either 600GB @ 15k rpm or 2TB @ 7.2k rpm
• 7.2TB or 24 TB per cell
• Mixing disks
• Disks in one cell cannot be mixed
• Disks in one DBM cannot be mixed
• Disk between DBM‟s can be mixed
• All disks can contain data
• Only first 2 disks use 13GB of space for system software
Exadata Disks Storage Layout
Cell
Disk
Sys Area Sys Area
Grid Disk n
Grid Disk 1
…
ASM disk
ASM disk
• Physical disks map to a Cell Disks
• Cell Disks partitioned into one or multiple Grid Disks
• ASM diskgroups created from Grid Disks
• Transparent above the ASM layer
Physical
Disk
Exadata Cell Exadata Cell
Exadata Storage Layout Example Grid Disks
• Cell Disks are logically partitioned into Grid Disks
• Grid Disk is the entity allocated to ASM as an ASM disk
• Minimum of one Grid Disk per Cell Disk
• Can be used to allocate “hot”, “warm” and “cold” regions of a
Cell Disk or to separate databases sharing Exadata Cells
Grid
Disk
Exadata Storage Layout Example ASM Disk Groups and Mirroring
• Two ASM disk groups defined
• One for the active, or “hot” portion, of the database and a
second for the “cold” or inactive portion
• ASM striping evenly distributes I/O across the disk group
• ASM mirroring is used protect against disk failures
• Optional for one or both disk groups
Exadata Cell Exadata Cell
Hot Hot Hot Hot Hot Hot
Cold Cold Cold Cold Cold Cold
Hot ASM
Disk Group Cold ASM
Disk Group
Exadata Storage Layout Example ASM Mirroring and Failure Groups
• ASM mirroring is used protect against disk failures
• ASM failure groups are used to protect against cell
failures
Exadata Cell Exadata Cell
Hot Hot Hot Hot Hot Hot
Cold Cold Cold Cold Cold Cold
ASM
Disk Group
ASM
Failure Group
ASM
Failure Group
Copyright © 2009 Oracle Corporation and/or its affiliates – 48 –
<Insert Picture Here>
Working with the Exadata cell
Exadata DBM comes pre-installed
• DB nodes have plain Linux install
• Need to configure IP‟s
• Need to install Grid Infra and Oracle Homes
• Storage nodes have (near) latest version of software
• Installed at factory, later version could be out there
• ACS uses tool „OneCommand‟ to install DBM
• Put all required entries in XLS sheet
• Start tool from first DB node
• Will configure IP‟s, storage, patches etc etc etc
• Can be used to re-image entire system
• See manual and MOS note 1110675.1 for more details
Users on the Exadata Cell
• root
• OS owner
• Full access right to Exadata functions
• celladmin
• Exadata software owner
• Full access rights to Exadata functions
• cellmonitor
• Read-only view account
• Read-only access to Exadata functions
Exadata running processes
• CellSRV
• Handles all threads / requests / communication
• Multi threaded for optimal performance, written in C++
Database Server
RDBMS/ASM instance
dskm
diskmon
SGA
Network Fabric
cellinit.ora
cellip.ora
Local IP
Cells
rs
ms cellcli
ADR
adrci
Data
Meta data
Data
Meta data
Data
Meta data
libcell
IO Client
IO Layer
ASM layer
Exadata Storage Server
cellsrv
/etc
/ora
cle
/cell
/netw
ork
-co
nfi
g
Exadata running processes
• Management Server (MS)
• Web service for cell management commands
• Runs background monitoring threads. Uses OC4J
Database Server
RDBMS/ASM instance
dskm
diskmon
SGA
Network Fabric
cellinit.ora
cellip.ora
Local IP
Cells
rs
ms cellcli
ADR
adrci
Data
Meta data
Data
Meta data
Data
Meta data
libcell
IO Client
IO Layer
ASM layer
Exadata Storage Server
cellsrv
/etc
/ora
cle
/cell
/netw
ork
-co
nfi
g
Exadata running processes
• RS (Restart Service)
• Watchdog process for MS and CellServ
Database Server
RDBMS/ASM instance
dskm
diskmon
SGA
Network Fabric
cellinit.ora
cellip.ora
Local IP
Cells
rs
ms cellcli
ADR
adrci
Data
Meta data
Data
Meta data
Data
Meta data
libcell
IO Client
IO Layer
ASM layer
Exadata Storage Server
cellsrv
/etc
/ora
cle
/cell
/netw
ork
-co
nfi
g
Restart Server checks if..
• MS http port is alive
• MS memory usage is within limits
• Heartbeat to cellsrv • If heartbeat(s) are missed, service is restarted
• The corresponding monitoring process creates an ADR incident
before restarting the service (user gets notified by email)
• Default
• Heartbeat timeout = 10 seconds
• Poll interval for cellsrv = 4 seconds
• Poll interval for MS = 20 seconds
Introducing CellCLI
• CellCLI (Cell Command Line Interface)
• Command line utility for managing cell resources
Database Server
RDBMS/ASM instance
dskm
diskmon
SGA
Network Fabric
cellinit.ora
cellip.ora
Local IP
Cells
rs
ms cellcli
ADR
adrci
Data
Meta data
Data
Meta data
Data
Meta data
libcell
IO Client
IO Layer
ASM layer
Exadata Storage Server
cellsrv
/etc
/ora
cle
/cell
/netw
ork
-co
nfi
g
Introducing CellCLI
[celladmin@cell01 ~]# cellcli
CellCLI: Release 11.1.3.0.0 - Production on Tue Oct 04 22:13:21 PDT 2008
Copyright (c) 2007, 2008, Oracle. All rights reserved.
Cell Efficiency ratio: 73.1
CellCLI>
• CellCLI runs on the cell from any user
• Run locally from a shell prompt
• Run remotely via ssh or dcli
• Run automatically by EM agent with Exadata EM plugin
CellCLI
• Command interface for managing the cell.
• Written in Java
• Command determines target:
• help, describe, set, spool, start: handled by CellCLI
• alter cell startup/shutdown services RS
• calibrate orion
• everything else MS
CellCLI Commands
• Administration commands -- Similar to SQLPLUS:
• EXIT or QUIT: return control to invoking shell
• HELP: displays syntax and usage descriptions for all CellCLI
commands
• SET : sets parameter options in the CellCLI environment.
• SPOOL: writes results of commands to the specified file on
the cell file system.
• START or @: runs the CellCLI commands in the specified
script file.
cellCLI options
• port-number
• (normally not used... retrieved from cellinit.ora or 8888)
• -e
• immediate execution of the command line
• -n
• non-interactive: no prompting
CellCLI Help Command
CellCLI> help
HELP [topic]
Available Topics:
ALTER
ALTER ALERTHISTORY
ALTER CELL
ALTER CELLDISK
ALTER GRIDDISK
ALTER IORMPLAN
ALTER LUN
ALTER THRESHOLD
ASSIGN KEY
CALIBRATE
CREATE
CREATE CELL
CREATE CELLDISK
CREATE GRIDDISK
…
CellCLI>
Creating a New Exadata Cell
[celladmin@cell01 ~]$ cellcli
CellCLI: Release 11.1.3.0.0 - Production on Tue Oct 04 22:13:21 PDT 2008
Copyright (c) 2007, 2008, Oracle. All rights reserved.
Cell Efficiency ratio: 1.0
CellCLI> CREATE CELL cell01 -
realmName=my_realm, -
interconnect1=bond0, -
smtpServer='my_mail.example.com', -
smtpFromAddr='[email protected]', -
smtpPwd=<exadata cell01 email password>, -
smtpToAddr='[email protected]', -
notificationPolicy='critical,warning,clear', -
notificationMethod='mail'
Cell cell01 successfully created
Starting CELLSRV services…
The STARTUP of CELLSRV services was successful
CellCLI>
Exadata CellCLI Basic Commands
• Basic subset of CellCLI commands • ALTER CELL [STARTUP|SHUTDOWN|RESTART ]
SERVICES [ALL|MS|RS|CELLSRV]
• CREATE CELLDISK ALL [HARDDISK|FLASHDISK]
• CREATE GRIDDISK ALL PREFIX='<prefix>'
[,SIZE=<size>
• LIST [LUN|CELLDISK|GRIDDISK] [<name>] [DETAIL]
• DROP [CELLDISK|GRIDDISK] ALL [ERASE=3PASS]
• Detailed Description in the User Guide
• Available on any Exadata Storage Cell
• Updated after each Exadata patch install
Create Celldisks
• CREATE CELLDISK ALL to create all celldisks
- Default name is CD<id>_<cellname>
- Idempotent operation – skips any existing celldisks
• Can also create single celldisk at a time
Creating Cell Disks
CellCLI> CREATE celldisk all harddisk
CellDisk CD_disk01_cell1 successfully created
CellDisk CD_disk02_cell1 successfully created
CellDisk CD_disk03_cell1 successfully created
CellDisk CD_disk04_cell1 successfully created
CellDisk CD_disk05_cell1 successfully created
.....
CellDisk CD_disk11_cell1 successfully created
CellDisk CD_disk12_cell1 successfully created
CellCLI> LIST CELLDISK
CD_disk01_cell1 normal
CD_disk02_cell1 normal
CD_disk03_cell1 normal
....
FD_02_cell1 normal
FD_03_cell1 normal
Create Griddisks
• Divide each celldisk into multiple griddisks
• Griddisk = ASM disk
•CREATE GRIDDISK ALL prefix=<name>
• Create multiple griddisks, one per celldisk
• Default Name is <prefix>_<celldiskname>
• Skips existing griddisks with same name
• Size is optional – will use all remaining space
• Order of griddisk creation (hottest coldest)
• Total Griddisk name size max 31 characters
• Limitation of ASM, not Exadata
Creating Grid Disks
CellCLI> CREATE GRIDDISK ALL HARDDISK PREFIX=data, size=300G
GridDisk data_CD_disk01_cell1 successfully created
GridDisk data_CD_disk02_cell1 successfully created
...
GridDisk data_CD_disk12_cell1 successfully created
CellCLI> CREATE GRIDDISK ALL HARDDISK PREFIX=recov
GridDisk recov_CD_disk01_cell1 successfully created
...
GridDisk recov_CD_disk12_cell1 successfully created
CellCLI> LIST GRIDDISK
data_CD_disk01_cell1 active
...
data_CD_disk12_cell1 active
recov_CD_disk1_cell1 active
...
recov_CD_disk12_cell1 active
ASM and Database Configuration
• Create two new files at each ASM & Database node in
/etc/oracle/cell/network-config
• cellinit.ora
• IP addresses of the Local InfiniBand interfaces
• cellip.ora
• Universe of remote Exadata Servers that can be accessed
• IP addresses of remote Exadata Servers InfiniBand interfaces
• Needs to be identical on all nodes of the ASM cluster
ASM & Database Node Config Files
# mkdir -p /etc/oracle/cell/network-config
# chown oracle:dba /etc/oracle/cell/network-config
# chmod ug+wx /etc/oracle/cell/network-config
$ cat /etc/oracle/cell/network-config/cellinit.ora
ipaddress1=192.168.50.23/24
$ cat /etc/oracle/cell/network-config/cellip.ora
cell="192.168.50.27:5042"
cell="192.168.50.28:5042"
(or)
cell="cell1"
Cell="cell2"
ASM Discovery and Diskgroups
• Universe of Exadata Servers specified in cellip.ora
• ASM discovery string to subset Exadata Servers
• Can specify NULL discovery string (default)
• format is “o/<exadata_connect_info>/<griddisk_name>”
• <exadata_connect_info> format in cellip.ora
• Failure Groups Created by Default
• One failure group per Exadata Server
• New parameters for create diskgroup
• 'cell.smart_scan_capable'='TRUE'
• 'compatible.rdbms'='11.2.0.0.0'
• 'compatible.asm'='11.2.0.0.0'
• 'au_size' = '4M' (recommended)
Exadata Only ASM Diskgroup
CREATE DISKGROUP data NORMAL REDUNDANCY
'o/*/data*'
ATTRIBUTE 'compatible.rdbms' = '11.2.0.0.0',
'compatible.asm' = '11.2.0.0.0',
'cell.smart_scan_capable' = 'TRUE',
'au_size' = '4M';
Failure group CELL02
o/cell01/data_cd_disk01_cell01
o/cell01/data_cd_disk02_cell01
o/cell01/data_cd_disk03_cell01
o/cell01/data_cd_disk04_cell01
...
o/cell01/data_cd_disk12_cell01
Failure group CELL01
o/cell02/data_cd_disk01_cell02
o/cell02/data_cd_disk02_cell02
o/cell02/data_cd_disk03_cell02
o/cell02/data_cd_disk04_cell02
...
o/cell02/data_cd_disk12_cell02
GUI interface for ASM in 11.2
Database / Application Changes
• Requires ASM to manage Exadata storage
• Transparent to SQL, users, applications
• SQL Offload processing only
• Direct path reads (default for PQ)
• Tablespace in Exadata only diskgroup
• No specific parameters in the init.ora
• No specific changes to tnsnames.ora
It’s just a normal 11gR2 database
Q U E S T I O N S
A N S W E R S
Now you do it !
Exadata Simulator
What to expect?
• Exadata Simulator
• Is STRICTLY INTERNAL ORACLE USE ONLY
• Not allowed to send it to partners or customers
• Alternative to Exadata Hardware
• Use Simulator to get familiar with the product.
• Configuration with Exadata
• Data layout with Exadata
• Disk failure handling with Exadata
• Cell failure handling with Exadata
• Explain plans with Exadata
• V$ view Exadata parameters
© 2006 Oracle Corporation – Proprietary and Confidential
Exadata Simulator
What not to expect?
• Hardware is fake
• “only the brain” without the “brawn”
• No performance benefits
• Flash cache is also on disks, not flash
• All disks (both cells) are simulated on one physical disk
• No alerts from hardware monitoring
• BMC
• Disk Controller/Cache
• CPU
• Power/Temp/Voltage
• No MAIL or SNMP alerting
© 2006 Oracle Corporation – Proprietary and Confidential
Goals
• Connect to the environment
• Connect to the cells
• Check cell environment
• Create cell disks
• Create Grid disks
• Check database node for Exadata access
• Create ASM instance
• With 2 disk-groups
• Create Database instance based on Exadata
The environment
• 3 VM‟s available per group on 1 physical system
• Database environment (1x)
• OEL5 64 bit
• 2 Gb memory
• 20Gb harddisk
• Oracle software 11.2.0.1 pre-installed
• /u01/oracle/product/11.2.0/dbhome_1
• /u01/oracle/product/11.2.0/grid
• Exadata Simulator (2x)
• OEL5 64 bit
• 4 Gb memory
• 12x 500Mb Exadata disk
• 4x 500Mb Flash memory (on disk)
Connect to the environment
• VNC only access
• No downloads allowed !!
• Connect your laptop to the network
• Connect using VNC
• A node will be assigned to you
• Connection details: 80.80.80.1xx:50