b35 inside rac by julian dyke
Post on 11-May-2015
390 Views
Preview:
TRANSCRIPT
1
DB Tech Showcase - Osaka May 2013
juliandyke.com © 2013 Julian Dyke
Julian Dyke Independent Consultant
Inside RAC
© 2013 Julian Dyke juliandyke.com 2
Agenda
OPS versus RAC Buffer Cache Global Cache Services
© 2013 Julian Dyke juliandyke.com 3
RAC Overview
Public Network
Shared Storage
Node 1
Instance 1
Node 2
Instance 2
Node 3
Instance 3
Node 4
Instance 4
Private Network
(Interconnect)
Storage Network
© 2013 Julian Dyke juliandyke.com 4
OPS versus RAC Oracle 8.0.6 and below
Instance 2
Node 2
OPS - Oracle 8.0.6 and below
Instance 1
Node 1 Interconnect
Shared Storage
Current Writes Consistent Reads
Current Reads All I/O uses shared storage
Enqueues only use interconnect
© 2013 Julian Dyke juliandyke.com 5
Instance 2
Node 2
OPS - Oracle 8.1.5 to Oracle 8.1.7 - Cache Fusion Phase 1
Instance 1
Node 1 Interconnect
Shared Storage
Current Writes Consistent Reads
Current Reads Current I/O always uses shared storage
Consistent reads can use interconnect
OPS versus RAC Oracle 8.1.5 to Oracle 8.1.7
© 2013 Julian Dyke juliandyke.com 6
Instance 2
Node 2
RAC - Oracle 9.0.1 and above - Cache Fusion Phase 2
Instance 1
Node 1 Interconnect
Shared Storage
Current Writes Consistent Reads
Current Reads Current I/O and
consistent reads can use interconnect
OPS versus RAC Oracle 9.0.1 and above
© 2006 Julian Dyke juliandyke.com 7
Head of Cold End
Head of Hot End
92
0
34
3
72
4
52
1
71
2
66
0
49
0
42
1
45
2
52
1
71
2
66
0
42
1
11
1
52
1
71
2
11
1
42
1
42
2
71
0
92
0
34
3
72
4
45
2
11
1
52
1
42
2
33
1
45
2
11
1
42
2
33
1
34
4
92
0
34
4
72
4
45
2
11
1
42
0
33
1
71
0
87
1
87
1
72
4
33
1
45
2
Read Block 42
Get first available buffer from cold end
Update buffer contents Insert buffer at head of cold end
Read Block 11
Get first available buffer from cold end
Update buffer contents Insert buffer at head of cold end
Read Block 42
Update touch count for block 42
Read Block 33
Move block 71 to head of hot end
Set touch count on block 71 to zero
Get first available buffer from cold end
Update buffer contents Insert buffer at head of cold end
Read Block 34
Update touch count for block 34
Read Block 87
Move block 42 to head of hot end
Set touch count on block 42 to zero
Get first available buffer from cold end
Update buffer contents Insert buffer at head of cold end
STOP
Block Number
Touch Count
Buffer Cache Single Block Reads
© 2006 Julian Dyke juliandyke.com 8
Head of Cold End
Head of Hot End
Read Block 1
Get first four available buffers from cold end Read next four blocks into buffers
1 2 3 4
Insert buffers at head of cold end
1 2 1 3 2 1 4 3 2 1
Move block 1 to cold end
1 2 1
Read Block 2
Move block 2 to cold end
2 1 3 2 1 3 4
Read Block 3
Move block 3 to cold end
Read Block 4
Move block 4 to cold end
Read Block 5
Get next four available buffers from cold end Read next four blocks into buffers Insert buffers at head of cold end Move block 5 to cold end
4 3 2 1 5
5 5 6
7 6
7 6 5
8
7 8 5 5 6 5 6 5 6 7 5 6 7 8
Read Block 6
Move block 6 to cold end
Read Block 7
Move block 7 to cold end
Read Block 8
Move block 8 to cold end
STOP
DB_FILE_MULTIBLOCK_READ_COUNT = 4
Buffer Cache Multi Block Reads
© 2013 Julian Dyke juliandyke.com 9
Global Services Overview Resource
Object to which access must be controlled at instance level
Enqueue
Memory structure that serializes access to a resource
Global Resources Object to which access must be controlled at cluster level
Global Enqueue
Locks and enqueues which need to be consistent between all instances
© 2013 Julian Dyke juliandyke.com 10
Global Services Overview Global Resource Directory (GRD)
Records current state and owner of each resource Contains convert and write queues Distributed across all instances in cluster Maintained by GCS and GES
Global Cache Services (GCS)
Implements cache coherency for database Coordinates access to database blocks for instances
Global Enqueue Services (GES)
Controls access to other resources (locks) including library cache and dictionary cache
Performs deadlock detection
© 2013 Julian Dyke juliandyke.com 11
Global Cache Services Introduction Global Cache Services exist to implement Cache Fusion
Cache Fusion allows blocks to be updated by multiple
instances
Only one instance can have the updatable (current) version of a block GCS must ensure that only one instance can update a
block at any time
Many instances can have read-only (consistent read) versions of a block Instances can have multiple copies of same block at
different SCNs
© 2013 Julian Dyke juliandyke.com 12
Global Cache Services 2 way Current Read
Instance 1
Instance 2
Instance 4
1318
Request shared resource
Instance 3
Resource Master
Instance 2 requests current read on block
Request granted
S N
Read request
Block returned
1318
1
2
3
4
STOP
© 2013 Julian Dyke juliandyke.com 13
Global Cache Services 3-way Current Read
Instance 1
Instance 2
Instance 4
1318
Request exclusive resource
Instance 3
Resource Master
Instance 1 requests exclusive read on block
Transfer block to Instance 1 for exclusive access
S N Block and resource status
Resource status
1318
1
2
3
4
N
N
X
1320
STOP
© 2013 Julian Dyke juliandyke.com 14
Global Cache Services 3-way Current Read (Dirty Block)
Instance 1
Instance 2
Instance 4
1318
Request block in exclusive mode
Instance 3
Resource Master
Instance 4 requests exclusive read on block
Transfer block to Instance 4 in exclusive mode
S N
Block and resource status
Resource status
1318
1 2
3
4 N N X
1320 N
N
X 1320 1323
STOP
Note that Instance 1 will create a past image (PI) of the dirty block
© 2013 Julian Dyke juliandyke.com 15
Global Cache Services 3-way Current (Without Downgrade)
Instance 1
Instance 2
Instance 4
1318
Request block in shared mode
Instance 3
Resource Master
Instance 2 requests current read on block
Block and resource status
Resource status
1
3
4
N N X
1320 N
N
X 1320 1323
Transfer block to Instance 2 in shared mode
2
STOP
In Oracle 8.1.5 and above _fairness_threshold is used to avoid unnecessary lock conversions
© 2013 Julian Dyke juliandyke.com 16
Global Cache Services 3-way Current (With Downgrade)
Instance 1
Instance 2
Instance 4
1318
Request block in shared mode
Instance 3
Resource Master
Instance 2 requests current read on block
Block and resource status
Resource status
1
3
4
N N X
1320 N X
1320 1323
Transfer block to Instance 2 in shared mode
2
S
S
STOP
In Oracle 8.1.5 and above _fairness_threshold is used to avoid unnecessary lock conversions
© 2013 Julian Dyke juliandyke.com 17
Global Cache Services Past Images When an instance passes a dirty block to another instance it
Flushes redo buffer to redo log
Retains past image (PI) of block in buffer cache PI is retained until another instance writes block to disk Used to reduce recovery times
Recorded in V$BH.STATUS as PI
Based on X$BH.STATE (value 8 in Oracle 10.2)
© 2013 Julian Dyke juliandyke.com 18
Global Cache Services Past Images
7128 7129 UPDATE t1 SET c1 = 7124; COMMIT;
UPDATE t1 SET c1 = 7129; COMMIT;
7123
Instance 1
7123 7124 7125 7126 7127
Buffer Cache
7124 7123
7125 7124
7126 7125
7127 7126
7128
7128 7127
Redo Log 1
Instance 2
Buffer Cache
7129 7128
UPDATE t1 SET c1 = 7125; COMMIT;
UPDATE t1 SET c1 = 7126; COMMIT;
UPDATE t1 SET c1 = 7127; COMMIT;
UPDATE t1 SET c1 = 7128; COMMIT; 7128
7123
Redo Log 2
7123
7128 7129 7129
7129
7129
Assume table t1 contains a single row in block 42
Instance 1 updates column to 7124 Block 42 is read from disk Undo/Redo written to
Redo Log 1 Block 42 is updated in buffer
cache Instance 1 updates column to
7125 Undo/Redo written to
Redo Log 1 Block 42 is updated in buffer
cache Instance 1 updates column to
7126 Undo/Redo written to
Redo Log 1 Block 42 is updated in buffer
cache Instance 1 updates column to
7127 Undo/Redo written to
Redo Log 1 Block 42 is updated in buffer
cache Instance 1 updates column to
7128 Undo/Redo written to
Redo Log 1 Block 42 is updated in buffer
cache Instance 2 updates column to
7129 GCS transfers block from Instance 1 to Instance 2
Instance 1 makes block 42 a Past Image block Undo/redo written to
Redo Log 2 Block 42 is updated in buffer
cache Instance 2 Crashes
Contents of buffer cache are lost DBWR has not written changes
to block 42 back to disk yet Instance 1 must perform recovery for Instance 2
Block 42 needs recovery Instance 1 uses Past Image Undo/redo is applied from
Redo Log 2 Block 42 is subsequently written
back to disk by DBWR
STOP
© 2013 Julian Dyke juliandyke.com 19
Global Cache Services Wait Events Wait events show reads where messages have been
exchanged with other instances Can include:
gc cr grant 2-way gc cr block 2-way gc cr block 3-way gc cr multi block request gc current grant 2-way gc current block 2-way gc current block 3-way gc current multi block request
© 2013 Julian Dyke juliandyke.com 20
Global Cache Services gc cr block 3-way wait event
Source Destination Description Bytes
RAC4 - Server RAC2 - LMS1 Request file 8 block 15 456
RAC2 - LMS1 RAC4 - Server OK 212
RAC2 - LMS1 RAC3 - LMS1 Send file 8 block 15 to RAC4 480
RAC3 - LMS1 RAC2 - LMS1 OK 212
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 1 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 2 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 3 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 4 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 5 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 6 868
© 2013 Julian Dyke juliandyke.com 21
Global Cache Services gc cr block 3-way wait event
RAC1
RAC2
RAC4
1318
RAC3
Resource Master
1,40 2,44
1,42 2,44
UPDATE t1 SET c2 = 50
WHERE c1 = 2;
1
2
3
4 5
10
6 7
8 9
1,42 2,44 1,42 2,44
© 2013 Julian Dyke juliandyke.com 22
Global Cache Services gc cr block 2-way wait event 2-way Consistent Read
Source Destination Description Bytes
RAC4 - Server RAC3 - LMS1 Request file 6 block 69 400
RAC3 - LMS1 RAC4 - Server OK 212
RAC3 - LMS1 RAC4 - Server Block file 6 block 69 part 1 1500
RAC3 - LMS1 RAC4 - Server Block file 6 block 69 part 2 1500
RAC3 - LMS1 RAC4 - Server Block file 6 block 69 part 3 1500
RAC3 - LMS1 RAC4 - Server Block file 6 block 69 part 4 1500
RAC3 - LMS1 RAC4 - Server Block file 6 block 69 part 5 1500
RAC3 - LMS1 RAC4 - Server Block file 6 block 69 part 6 868
© 2013 Julian Dyke juliandyke.com 23
Global Cache Services gc cr block 2-way wait event
RAC1
RAC2
RAC4
1318
RAC3
Resource Master
1,40 2,44
1,40 2,44
UPDATE t1 SET c2 = 50
WHERE c1 = 2;
1 2
3 4
5 6
7 8
1,40 2,44 1,40 2,44
STOP
© 2013 Julian Dyke juliandyke.com 24
Global Cache Services gc current block 3-way wait event 3-way Current Read
Source Destination Description Bytes
RAC4 - Server RAC2 - LMS1 Request file 8 block 15 456
RAC2 - LMS1 RAC4 - Server OK 212
RAC2 - LMS1 RAC3 - LMS1 Send file 8 block 15 to RAC4 480
RAC3 - LMS1 RAC2 - LMS1 OK 212
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 1 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 2 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 3 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 4 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 5 1500
RAC3 - LMS1 RAC4 - Server Block file 8 block 15 part 6 868
RAC4 - LMS1 RAC2 - LMS1 Received file 8 block 15 244
RAC2 - LMS1 RAC4 - LMS1 OK 212
© 2013 Julian Dyke juliandyke.com 25
11
Global Cache Services gc current block 3-way wait event
RAC1
RAC2
RAC4
1318
RAC3
Resource Master
1,40 2,44
1,42 2,44
UPDATE t1 SET c2 = 50
WHERE c1 = 2;
1
2
3
4 5
10
6 7
8 9
1,42 2,44
12
UPDATE t1 SET c2 = 42
WHERE c1 = 1;
RAC3 saves past image of the dirty block until RAC4 writes the block to disk
1,42 2,44
1,42 2,50
STOP
© 2013 Julian Dyke juliandyke.com 26
Global Cache Services gc cr grant 2-way wait event 2-way Consistent Read
Source Destination Description Bytes
RAC4 - Server RAC3 - LMS1 Request file 6 block 69 400
RAC3 - LMS1 RAC4 - Server OK 212
RAC3 - LMS1 RAC4 - Server Grant read file 6 block 69 276
RAC4 - Server RAC3 - LMS1 OK 212
© 2013 Julian Dyke juliandyke.com 27
Global Cache Services gc cr grant 2-way wait event
RAC1
RAC2
RAC4
1318
RAC3
Resource Master
1,40 2,44 1,40 2,44
1,40 2,44
SELECT c2 FROM t1
WHERE c1 = 1;
1 2
5 6
3 4
STOP
© 2013 Julian Dyke juliandyke.com 28
Global Cache Services gc cr multi block request wait event
Source Destination Description Bytes
RAC4 - Server RAC3 - LMS1 Request file 8 blocks 69-76 1872
RAC3 - LMS1 RAC4 - Server OK 212
RAC3 - LMS1 RAC4 - Server Grant file 8 blocks 69-76 to RAC4 772
RAC4 - Server RAC3 - LMS1 OK 212
© 2013 Julian Dyke juliandyke.com 29
Global Cache Services gc cr multi block request wait event
RAC1
RAC2
RAC4
1318
RAC3
Resource Master
SELECT c2 FROM t1
WHERE c1 = 1;
1 2
5 6
3 4
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
1,40 2,44
STOP
© 2013 Julian Dyke juliandyke.com 30
Global Cache Services gc cr multi block request wait event The following 10046/8 trace is for a gc cr multi block request
WAIT #2: nam='gc cr multi block request' ela= 722 file#=4 block#=248 class#=1 obj#=51866 tim=1169728375495574
WAIT #2: nam='db file scattered read' ela= 10437 file#=4 block#=244 blocks=5 obj#=51866 tim=1169728375506092
This trace can be misleading because: the gc cr multi block request specifies the LAST block in
the range the gc cr multi block request does not specify how many
blocks should be read the gc cr multi block request does not specify how many
blocks have been returned from another instance
© 2013 Julian Dyke juliandyke.com 31
Global Cache Services Block Mastering Each block is mastered on one instance
Block DBA is reported by X$KJBR.KJBRNAME Names have the format:
[<block_number>][<file_number>][BL]
For example
[0x137][0x40000][BL]
Ordering by X$KJBR.KJBRNAME is difficult because the resource names do not collate when sorted e.g.:
is file# 4, block# 311
[0x12E][0x40000][BL]
[0x12F][0x40000][BL]
[0x13][0x40000][BL]
[0x130][0x40000][BL]
[0x131][0x40000][BL]
etc...
© 2013 Julian Dyke juliandyke.com 32
Global Cache Services Block Mastering Some useful functions
CREATE OR REPLACE FUNCTION get_file_number (p_resource_name VARCHAR2) RETURN INTEGER IS pos1 INTEGER := INSTR (p_resource_name,'x',1,2); pos2 INTEGER := INSTR (p_resource_name,']',1,2); s VARCHAR2(30) := SUBSTR (p_resource_name,pos1+1,pos2-pos1-1); BEGIN RETURN TO_NUMBER (s,'XXXXXXXX') / 65536; END; /
CREATE OR REPLACE FUNCTION get_block_number (p_resource_name VARCHAR2) RETURN INTEGER IS pos1 INTEGER := INSTR (p_resource_name,'x',1,1); pos2 INTEGER := INSTR (p_resource_name,']',1,1); s VARCHAR2(30) := SUBSTR (p_resource_name,pos1+1,pos2-pos1-1); BEGIN RETURN TO_NUMBER (s,'XXXXXXXX'); END; /
© 2013 Julian Dyke juliandyke.com 33
Global Cache Services Block Mastering In Oracle 10.2 block mastering is determined by
_lm_contiguous_res_count Specifies number of contiguous blocks that will hash to the
same HV bucket Defaults to 128 For example
Start End
0x080 0x0FF
0x180 0x1FF 0x280 0x2FF 0x380 0x3FF 0x480 0x4FF 0x580 0x5FF
etc etc
Start End
0x000 0x07F
0x100 0x17F 0x200 0x27F 0x300 0x37F 0x400 0x47F 0x500 0x57F
etc etc
Instance 0 Instance 1
© 2013 Julian Dyke juliandyke.com 34
Global Cache Services Block Mastering The following table shows that masters are still assigned to
ranges of 128 contiguous blocks in a four-node cluster
Start Block End Block Master
0 127 1
128 255 2
256 383 2
384 511 3
512 639 3
640 767 3
768 895 1
896 1023 0
1024 1279 2
1280 1407 1
© 2013 Julian Dyke juliandyke.com 35
Global Cache Services Dynamic Remastering In Oracle 9.2
documentation describes dynamic remastering not implemented in code
In Oracle 10.1
work at data file level very high threshold so difficult to test does occur on some customer sites
In Oracle 10.2 and above
works at segment level thresholds are relatively low
© 2013 Julian Dyke juliandyke.com 36
Global Cache Services Dynamic Remastering Object remastering is recorded in V$GCSPFMASTER_INFO Instances are internally numbered 0, 1 etc Initially contains no rows After remastering object 52084 to instance 0
SELECT object_id, current_master, previous_master FROM v$gcspfmaster_info;
After remastering object 52084 to instance 1
Object ID Current Master Previous Master 52084 0 32767
Object ID Current Master Previous Master 52084 1 0
© 2013 Julian Dyke juliandyke.com 37
Global Cache Services Dynamic Remastering In Oracle 10.2 and above, information about Dynamic
Remastering operations is also reported in the following fixed views X$KJDRMREQ
Dynamic Remastering Requests
X$KJDRMAFNSTATS File Remastering Statistics
X$KJDRMHVSTATS
Hash Value Statistics
© 2013 Julian Dyke juliandyke.com 38
Global Cache Services Dynamic Remastering In Oracle 11.1 and above, Dynamic Remastering statistics are
reported in V$DYNAMIC_REMASTER_STATS
Column Name Data Type REMASTER_OPS NUMBER
REMASTER_TIME NUMBER
REMASTERED_OBJECTS NUMBER
QUIESCE_TIME NUMBER
FREEZE_TIME NUMBER
CLEANUP_TIME NUMBER
REPLAY_TIME NUMBER
FIXWRITE_TIME NUMBER
SYNC_TIME NUMBER
RESOURCES_CLEANED NUMBER
REPLAYED_LOCKS_SENT NUMBER
REPLAYED_LOCKS_RECEIVED NUMBER
CURRENT_OBJECTS NUMBER
© 2013 Julian Dyke juliandyke.com 39
Global Cache Services Dynamic Remastering Dynamic remastering is coordinated by the LMD0 background
The LMD0 process background process includes limited details of dynamic remastering operations
Excessive dynamic remastering can cause instance freezes Observed in both Oracle 10.1 and 10.2 Oracle Support occasionally recommends that dynamic
remastering is disabled using the following parameters:
_gc_affinity_time = 0 _gc_undo_affinity=FALSE
© 2013 Julian Dyke juliandyke.com 40
Thank you for listening
info@juliandyke.com
© 2013 Julian Dyke juliandyke.com 41
Backup
© 2013 Julian Dyke juliandyke.com 42
Interconnect Overview Instances communicate with each other over the interconnect
(network)
Information transferred between instances includes data blocks locks SCNs
Typically 1Gb Ethernet
UDP protocol Often teamed in pairs to avoid SPOFs
Can also use Infiniband
Fewer levels in stack
Other proprietary protocols are available
© 2013 Julian Dyke juliandyke.com 43
Interconnect TCP/IP Five Layer Model All messages travel down through layers, across physical
layer then up again
5 Application
4 Transport
3 Network
2 Data Link
1Physical
5 Application
4 Transport
3 Network
2 Data Link
1Physical
© 2013 Julian Dyke juliandyke.com 44
Interconnect TCP/IP Five Layer Model TCP/IP has a four or five layer model Five-layer model shown below
Layer TCP/IP Suite
5 Application DHCP, DNS, FTP, HTTP, SSH, NFS, NTP, SMTP, SNMP, TELNET, RPC, SOAP
4 Transport TCP, UDP
3 Network IP (IPv4, IPv6), ICMP, ARP, RARP
2 Data Link Ethernet, Token Ring, 802.11, Wi-Fi, FDDI, PPP
1 Physical 10BASE-T, 100BASE-T, 1000BASE-T, Optical Fibre, Twisted Pair
Four-layer model combines data link and physical layers
© 2013 Julian Dyke juliandyke.com 45
Interconnect TCP/IP Transport Layer Transport Layer
Connection-oriented (TCP) Connectionless (UDP)
Ethernet
Physical Layer
IP
TCP UDP Clusterware RAC
© 2013 Julian Dyke juliandyke.com 46
Interconnect Encapsulation
Ethernet Header
Ethernet Trailer
UDP Header
IP Header Data
UDP Header
IP Header Data
UDP Header Data
Data
4 bytes 14 bytes 20 bytes 8 bytes
MTU Size
© 2013 Julian Dyke juliandyke.com 47
Oracle Clusterware Node Heartbeat Messages Sent to each node in cluster every second in both directions
Checks nodes are still members of cluster Sent by ocssd.bin using TCP well-known port 49895
Outgoing message is 134 bytes (80 byte payload) Incoming message is 66 bytes (12 byte payload)
Node 1
Node 3
Node 2
Node 4
Outgoing
Incoming
© 2013 Julian Dyke juliandyke.com 48
Oracle Clusterware Node Status Messages Number of packets exchanged by a node is determined by
number of nodes in cluster Number of packets per node per hour is
(#nodes - 1) * 4 messages * 3600 seconds
Number of nodes Packets per hour 2 14,400 3 28,800 4 43,200 5 57,600 6 72,000 7 86,400 8 100,800
16 216,000 32 446,400
© 2013 Julian Dyke juliandyke.com 49
Datafiles Controlfiles
Redo Logs
RAC Background Processes Overview
Redo Logs
DIAG
LMON
LCK0
LMD0
LMSn
PMON SMON
LGWR
CKPT
ARCn
SMON PMON
DBWR DBWR LGWR
Shared Pool
Buffer Cache
Instance 2
Shared Pool
Buffer Cache
Instance 1
DIAG
LMON
LCK0
LMD0
LMSn
CKPT
ARCn
Node 1 Node 2
© 2013 Julian Dyke juliandyke.com 50
RAC Background Processes LMSn LMSn
Global Cache Service Process
Manage requests for data access across cluster
Up to 20 in Oracle 10.1 LMS0-LMS9 LMSa-LMSj
Up to 36 in Oracle 10.2
LMS0-LMS9 LMSa-LMSz
In Oracle 10.1 and above, number of GCS server processes can be configured using gcs_server_processes parameter Default value is 1 (single CPU system) Can also be configured using _lm_lms parameter
© 2013 Julian Dyke juliandyke.com 51
RAC Background Processes LMSn In Oracle 10.2 and above
LMS processes run in real-time mode Remaining processes run in time-share mode
Check using: [oracle@server3 ~]$ ps -eo pid,user,opri,cmd | grep ora_lm
8596 oracle 75 ora_lmon_TEST1 8598 oracle 75 ora_lmd0_TEST1 8601 oracle 58 ora_lms0_TEST1
58 is real time; 75 or 76 is time share You can also check process scheduling policies using chrt
oracle@server3 ~]$ chrt -p 8601 # lms0 - Real Time pid 8601's current scheduling policy: SCHED_RR pid 8601's current scheduling priority: 1
[oracle@server3 ~]$ chrt -p 8596 # lmon - Time Share pid 8596's current scheduling policy: SCHED_OTHER pid 8596's current scheduling priority: 0
© 2013 Julian Dyke juliandyke.com 52
RAC Background Processes LCK0 LCK0
Instance Enqueue Process Part of KCL (Kernel Cache Library)
Manages
instance resource requests cross-instance call operations
Assists LMS processes
Formerly known as lock process
One LCK0 process per instance
In 9.0.1 and below, number of lock processes may be
configurable using _gc_lck_procs parameter
© 2013 Julian Dyke juliandyke.com 53
RAC Background Processes LMD0 LMD0
Global Enqueue Service Daemon
Manages requests for global enqueues Updates status of enqueues when granted to / revoked
from an instance
Responsible for deadlock detection
One LMD0 process per instance
In 8.1.7 and below number of lock daemons may be configurable using _lm_dlmd_processes parameter
© 2013 Julian Dyke juliandyke.com 54
RAC Background Processes LMON LMON
Global Enqueue Service Monitor
One LMON process per instance
Monitors cluster to maintain global enqueues and resources
Manages instance and process expirations recovery processing for cluster enqueues
© 2013 Julian Dyke juliandyke.com 55
RAC Background Processes DIAG DIAG - Diagnosability Process
Collects diagnostic data in the event of a failure
Creates subdirectories in BACKGROUND_DUMP_DEST directory
In Oracle 9.0.1 and above can be disabled using _diag_daemon parameter Do not try this on a production system
© 2013 Julian Dyke juliandyke.com 56
Global Cache Services UDP Messages There are two types of message exchanged within RAC These are PROBABLY defined as follows
Synchronous These messages require an acknowledgement for each
packet In some cases the acknowledgement packet can be
larger than the original request e.g. SCN synchronization
Asynchronous
These messages do not require an individual acknowledgement for each packet e.g. block transfers between instances
© 2013 Julian Dyke juliandyke.com 57
Global Cache Services Lock Modes Lock modes can be:
Null Another instance can hold an exclusive or shared lock
Shared Another instance can hold a shared lock but not an
exclusive lock Exclusive
No other instances can hold shared or exclusive locks
Locks can also be: Local
No other instance has held an exclusive lock Global
Another instance has held an exclusive lock in the past
© 2013 Julian Dyke juliandyke.com 58
Global Cache Services Fairness Threshold Intended to prevent unnecessary lock downgrades when other
instances only require read-only copies
For write to read transfers Writing instance retains X lock Reading instance retains null lock
If _fairness_threshold reached then
Writing instance downgrades X lock to S lock Reading instance receives S lock
_fairness_threshold default value is 4
© 2013 Julian Dyke juliandyke.com 59
Global Cache Services Lock Elements Lock elements are externalized in the V$LOCK_ELEMENT
dynamic performance view Based on X$LE
Additional information is available in the X$LE view
Past image buffers do not have a lock element
In OPS one lock element could manage a contiguous range of
blocks Still can in RAC using GC_FILES_PER_LOCK parameter Disables Cache Fusion
© 2013 Julian Dyke juliandyke.com 60
Global Cache Services Lock Elements
Contain embedded GCS Client structures (KJBL)
Lock Element
GCS Client
Buffer Header
Lock Element
GCS Client
Buffer Header
Buffer Header
Lock Element
GCS Client
Buffer Header
© 2013 Julian Dyke juliandyke.com 61
Global Cache Services Memory Structures
KJBR KJBR
KJBL
BH BH
LE
KJBL
LE
KJBL
GCS Client
GCS Shadow
GCS Resource
Block Header Lock
Element
GCS Shadow describes blocks
held by other instances, but
mastered locally
© 2013 Julian Dyke juliandyke.com 62
Global Cache Services Memory Structures GCS Resources (KJBR)
Stored in segmented array Number of GCS resource structures determined by
_gcs_resources parameter Externalized in X$KJBR Number of free GCS resource structures in X$KJBRFX
GCS Enqueues (Clients / Shadows) (KJBL)
GCS clients embedded in lock elements GCS shadows stored in segmented array Number of GCS shadow structures determined by
_gcs_shadow_locks parameter Externalized in X$KJBL Number of free GCS shadow structures in X$KJBLFX
© 2013 Julian Dyke juliandyke.com 63
Global Cache Services Dynamic Remastering Example
SELECT data_object_id FROM dba_objects WHERE owner = 'US01'AND object_name = 'T1'; OBJECT_ID --------- 52084
ORADEBUG LKDEBUG -m pkey 52084
To remaster object at current instance use:
All blocks now mastered by the current instance
To redistribute masters to all available instances use: ORADEBUG LKDEBUG -m dpkey 52084
Blocks mastered by both (all) instances again
© 2013 Julian Dyke juliandyke.com 64
Global Cache Services Block Mastering In Oracle 10.1 and below block mastering is determined by a
hash function Algorithm applied to groups of 1289 contiguous blocks In two node cluster
Instance 0 has 645 blocks Instance 1 has 644 blocks etc
In three node cluster Instance 0 has 430 blocks Instance 2 has 215 blocks Instance 1 has 430 blocks Instance 2 has 214 blocks etc
Beware of small hot tables and indexes....
© 2013 Julian Dyke juliandyke.com 65
Global Cache Services Dumps To dump the contents of the global cache use:
ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME GC_ELEMENTS LEVEL 1';
GLOBAL CACHE ELEMENT DUMP (address: 0x21fecd18): id1: 0x3591 id2: 0x10000 obj: 181 block: (1/13713) lock: SL rls: 0x0000 acq: 0x0000 latch: 0 flags: 0x41 fair: 0 recovery: 0 fpin: 'kdswh05: kdsgrp' bscn: 0x0.18a9c bctx: (nil) write: 0 scan: 0x0 xflg: 0 xid: 0x0.0.0
GCS CLIENT 0x21fecd60,1 sq[(nil),(nil)] resp[(nil),0x3591.10000] pkey 181 grant 1 cvt 0 mdrole 0x21 st 0x20 GRANTQ rl LOCAL master 1 owner 0 sid 0 remote[(nil),0] hist 0x7c history 0x3c.0x1.0x0.0x0.0x0.0x0. cflag 0x0 sender 2 flags 0x0 replay# 0 disk: 0x0000.00000000 write request: 0x0000.00000000 pi scn: 0x0000.00000000 msgseq 0x1 updseq 0x0 reqids[1,0,0] infop 0x0 pkey 181 hv 107 [stat 0x0, 1->1, wm 32767, RMno 0, reminc 6, dom 0] kjga st 0x4, step 0.0.0, cinc 8, rmno 10, flags 0x0 lb 0, hb 0, myb 178, drmb 178, apifrz 0
© 2013 Julian Dyke juliandyke.com 66
Global Cache Services Dumps Continued
GLOBAL CACHE ELEMENT DUMP (address: 0x237f4358): id1: 0x6a39 id2: 0x10000 obj: 74 block: (1/27193) lock: SL rls: 0x0000 acq: 0x0000 latch: 0 flags: 0x41 fair: 0 recovery: 0 fpin: 'kdswh05: kdsgrp' bscn: 0x0.26992 bctx: (nil) write: 0 scan: 0x0 xflg: 0 xid: 0x0.0.0
GCS SHADOW 0x237f43a0,1 sq[0x2ee64e8c,0x2eff3858] resp[0x2ee64e74,0x6a39.10000] pkey 74 grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL master 0 owner 0 sid 0 remote[(nil),0] hist 0x12a5 .....
GCS RESOURCE 0x2ee64e74 hashq [0x2ee61894,0x2ff57390] name[0x6a39.10000] pkey 74 grant 0x2eff3858 cvt (nil) send (nil),0 write (nil),0@65535 flag 0x0 mdrole 0x1 mode 1 scan 0 role LOCAL ..... GCS SHADOW 0x2eff3858,1 sq[0x237f43a0,0x2ee64e8c] resp[0x2ee64e74,0x6a39.10000] pkey 74 grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL master 0 owner 1 sid 0 remote[0x23fea160,1] hist 0x65f .....
GCS SHADOW 0x237f43a0,1 sq[0x2ee64e8c,0x2eff3858] resp[0x2ee64e74,0x6a39.10000] pkey 74 grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL master 0 owner 0 sid 0 remote[(nil),0] hist 0x12a5 .....
© 2013 Julian Dyke juliandyke.com 67
Global Cache Services System Change Number In RAC clusters SCN must be maintained across all nodes in
cluster SCN propagation scheme differs according to version
In Oracle 10.1and below defaults to Lamport algorithm
Lamport in alert.log SCN piggy-backed on GCS/GES messages Recorded in redo log Default delay of 7 seconds
In Oracle 10.2 and above defaults to Broadcast on Commit
algorithm SCN negotiated immediately Apparently no delay
© 2013 Julian Dyke juliandyke.com 68
Global Cache Services System Change Number System Change Number algorithm is determined by the
MAX_COMMIT_PROPAGATION_DELAY parameter
In Oracle 10.1 and below Initialization parameter specified in centriseconds Default value is 700 centiseconds (7 seconds) Specifies maximum time taken for a COMMIT on one node
to be reflected on other nodes in the cluster For some applications performing rapid updates and
queries of the same data from different instances, value must be set to 0 (Broadcast on commit)
Examples include: E-Business suite SAP
© 2013 Julian Dyke juliandyke.com 69
Global Cache Services System Change Number In Oracle 10.2 and above
Default value of MAX_COMMIT_PROPAGATION_DELAY parameter is 0
SCN broadcast on commit method is used SCN updates are synchronized immediately
SCN is synchronized
after current read before block updated
This ensures correct SCN is written to block
© 2013 Julian Dyke juliandyke.com 70
Global Cache Services Broadcast on Commit Ethernet broadcast is not used
SCN is synchronized by updating instance
Sends UDP SCN synchronization message to each remote instance
Remote instances respond with their current SCN
Another round of messages may be required if remote SCNs are more recent than local SCN
Synchronization occurs every time an instance needs a new SCN
Synchronization is always performed by the updating instance Number of messages = 4 x (number of instances - 1)
© 2013 Julian Dyke juliandyke.com 71
Global Cache Services Broadcast on Commit In a 4-node cluster 12 messages are exchanged
Source Destination Description Bytes RAC4-LMS0 RAC1-LMS0 Send current SCN 192 RAC1-LMS0 RAC4-LMS0 OK 212 RAC4-LMS0 RAC2-LMS0 Send current SCN 192 RAC2-LMS0 RAC4-LMS0 OK 212 RAC4-LMS0 RAC3-LMS0 Send current SCN 192 RAC3-LMS0 RAC4-LMS0 OK 212 RAC1-LMS0 RAC4-LMS0 Send current SCN 192 RAC4-LMS0 RAC1-LMS0 OK 212 RAC2-LMS0 RAC4-LMS0 Send current SCN 192 RAC4-LMS0 RAC2-LMS0 OK 212 RAC3-LMS0 RAC4-LMS0 Send current SCN 192 RAC4-LMS0 RAC3-LMS0 OK 212
© 2013 Julian Dyke juliandyke.com 72
Global Cache Service Read Consistency When a read consistent version of a block is requested it may
be necessary to apply undo to a more recent version of that block
Undo can be applied by LMSn background process in Remote instance Local instance
If undo applied by remote instance, any outstanding redo
must first be flushed from redo buffer of remote instance to redo log Can have significant performance impact on consistent
reads Particularly on extended clusters
© 2013 Julian Dyke juliandyke.com 73
Global Cache Service Read Consistency Statistics on inter-instance consistent reads are reported in
V$CR_BLOCK_SERVER
Reports statistics for blocks served by local instances to remote instances including Number of consistent reads served Number of current reads served Number of data blocks served Number of undo blocks served Number of undo headers served Number of fairness down converts Number of log flushes Number of times light works rule invoked
© 2013 Julian Dyke juliandyke.com 74
Global Cache Service Read Consistency In theory, once a block has been written to disk, the LMS
process will not attempt to read it again when responding to a consistent read request
Light Works Rule Prevents LMS processes from going to disk when
responding to CR requests for data, undo or undo segment blocks
Can prevent LMS process from completing its response to a CR request
© 2013 Julian Dyke juliandyke.com 75
Global Cache Service Read Consistency Uncommitted changes MUST be flushed to the redo log before
the LMS process can ship a consistent block to another instance
Reading process must wait until redo log changes have been written to redo log by LMS process
Bad for standard RAC databases Reads must wait for redo log writes
Worse for extended / stretch RAC clusters
Increased latency of cross site disk communications
© 2013 Julian Dyke juliandyke.com 76
Global Cache Service Read Consistency For each block on which a consistent read is performed, a
redo log flush must first be performed
Number of redo log flushes is recorded in the FLUSHES column of V$CR_BLOCK_SERVER
Redo log flush time is recorded in the gc cr block flush time statistic for the
LMS process will increase time taken to serve consistent block will increase time taken to perform consistent read
If LMS processes become very busy, consistent reads will
experience high wait times e.g. for a full table scan gc cr multi block request
© 2013 Julian Dyke juliandyke.com 77
Global Cache Services Read Consistency
Committed transaction on RAC2 - All blocks still in buffer cache
110
109
108
108
Redo Buffer Redo Buffer
Buffer Cache Buffer Cache
RAC1 RAC2
Redo Log
1
2
3 110 110
STOP
© 2013 Julian Dyke juliandyke.com 78
Global Cache Services Read Consistency
Committed transaction on RAC2 - Some blocks written to disk
110
109
108
Redo Buffer Redo Buffer
Buffer Cache Buffer Cache
RAC1 RAC2
Redo Log
1
3
2
110
110
4
110
110
STOP
© 2013 Julian Dyke juliandyke.com 79
Global Cache Services Read Consistency
Uncommitted transaction on RAC2 - All blocks still in buffer cache
110
108
Redo Buffer Redo Buffer
Buffer Cache Buffer Cache
RAC1 RAC2
Redo Log
2
3 1
108 110
4
5
6
109
110
109
109
108 108
108 108
STOP
© 2013 Julian Dyke juliandyke.com 80
Global Cache Services Read Consistency
Uncommitted transaction on RAC2 - Some blocks written to disk
Redo Buffer Redo Buffer
Buffer Cache Buffer Cache
RAC1 RAC2
Redo Log
3
2
1
110
4
6
8
110 5
7 110
110
109
110
109
109
108 108
108
STOP
© 2013 Julian Dyke juliandyke.com 81
Global Cache Services Jumbo Frames By default Maximum Transmission Unit (MTU) is 1500 MTU includes
IP header UDP header Data
Requires six packets to transmit one 8192 byte block On some adapters MTU can be increased to around 9000
e.g. Intel PRO/1000 At command line
ifconfig eth1 mtu 9000 up
or in /etc/sysconfig/ifcfg-eth<x>
MTU=9000
© 2013 Julian Dyke juliandyke.com 82
Global Cache Services Jumbo Frames Example - cost of sending on 8192 byte block MTU=1500 (default)
Frame# Ethernet Header
IP Header UDP Header
Data Ethernet Trailer
Total
1 14 20 8 1472 4 1518 2 14 20 8 1472 4 1518 3 14 20 8 1472 4 1518 4 14 20 8 1472 4 1518
5 14 20 8 1472 4 1518 6 14 20 8 840 4 886
Total 84 120 48 8200 24 8476
Frame# Ethernet Header
IP Header UDP Header
Data Ethernet Trailer
Total
1 14 20 8 8200 4 8246 Total 14 20 8 8200 4 8246
MTU=9000
© 2013 Julian Dyke juliandyke.com 83
Global Cache Services Jumbo Frames Not all network adapter drivers support jumbo frames
Particularly cheap ones....
All network adapters in private interconnect must have same MTU size
Switch must also be configured to support jumbo frames
Lots of bugs and compatibility issues e.g. Bug 4447620: RAC UDP MTU size restricted to 1500 or 9000
affects 10.1.0.5, 10.2,0.1 fixed in 10.2.0.2 and above
top related