cern - real life rac scalability experiences

38
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Eric Grancher [email protected] CERN IT Anton Topurov [email protected] openlab, CERN IT Real life experiences of RAC scalability with a 6 node cluster

Upload: api-3759126

Post on 11-Apr-2015

333 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Eric Grancher [email protected]

CERN IT

Anton [email protected]

openlab, CERN IT

Real life experiences of RAC scalability with a 6 node cluster

Page 2: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Outline

• CERN computing challenge• Oracle RDBMS and Oracle RAC @ CERN• RAC scalability – what, why and how?• Real life scalability examples• Conclusions• References

Page 3: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

LHC gets ready …

Jürgen Knobloch / CERN

Page 4: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

The LHC Computing Challenge

• Signal/Noise 10-9

• Data volume– High rate * large number of

channels * 4 experiments

� 15 PetaBytes of new data each year

• Compute power– Event complexity * Nb.

events * thousands users� 100 k of (today's) fastest

CPUs

• Worldwide analysis & funding– Computing funding locally

in major regions & countries

– Efficient analysis everywhere

� GRID technologyJürgen Knobloch / CERN

Page 5: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it Jürgen Knobloch/CERN Slide 5

WLCG Collaboration

• The Collaboration– 4 LHC experiments

– ~250 computing centres– 12 large centres

(Tier-0, Tier-1)– 38 federations of smaller

“Tier-2” centres

• Growing to ~40 countries– Grids: EGEE, OSG, Nordugrid

• Technical Design Reports– WLCG, 4 Experiments: June 2005

• Memorandum of Understanding– Agreed in October 2005

• Resources– 5-year forward look

Jürgen Knobloch / CERN

Page 6: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Centers around the world form a Supercomputer

• The EGEE and OSG projects are the basis of the Worldwide LHC Computing Grid Project WLCG

Inter-operation between Grids is working!Jürgen Knobloch / CERN

Page 7: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it Jürgen Knobloch/CERN Slide 7

CPU & Disk Requirements 2006

0

50

100

150

200

250

300

350

2007 2008 2009 2010

Year

MS

I2000

LHCb-Tier-2CMS-Tier-2ATLAS-Tier-2ALICE-Tier-2LHCb-Tier-1CMS-Tier-1ATLAS-Tier-1ALICE-Tier-1LHCb-CERNCMS-CERNATLAS-CERNALICE-CERN

0

20

40

60

80

100

120

140

2007 2008 2009 2010

Year

PB

CPU Disk

CERN: ~ 10%

Jürgen Knobloch / CERN

Page 8: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Oracle databases at CERN

• 1982 : CERN starts using Oracle• 1996: OPS on Sun SPARC Solaris• 2000: Use of Linux x86 for Oracle RDBMS

• Today : Oracle RAC for most demanding services:– CASTOR mass storage system (15 PB / year)– Administrative applications (AIS)– Accelerators and controls etc.– LHC Computing Grid (LCG)

Page 9: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

(our view of) RAC basics

• Shared disk infrastructure, all disk devices have to be accessible from all servers

• Shared buffer cache (with coherency!)

Clients DB Servers Storage

Page 10: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Linux RAC deployment example

• RAC for Administrative Services move (2006)– 4 nodes x 4 CPUs, 16GB RAM/node– RedHat Enterprise Linux 4 / x86_64– Oracle RAC 10.2.0.3

• Consolidation of administrative applications, first:– CERN Expenditure Tracking– HR management, planning and follow-up as well as a self

service application– Computer resource allocation, central repository of

information (“foundation”)…

• Reduce number of databases, profit from HA, profit from information sharing between the applications (same database).

Page 11: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Linux RAC deployment for administrative applications

• Use of Oracle 10g services– One service per application OLTP workload– One service per application batch workload

– One service per application external access

• Use of Virtual IPs• Raw devices only for OCR and quorum devices• ASM disk management (2 disk groups)• RMAN backup to TSM

• Things became must easier and stable over time (GC_FILES_TO_LOCKS, shared raw device volumes …) , good experience with consolidation on RAC

Page 12: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

RAC Scalability

Frits Ahlefeldt-Laurvig / http://downloadillustration.com/

Page 13: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

RAC Scalability (1)

2 ways of upgrading database performance

Upgrading the hardware (scale up)- Expensive and inflexible

Adding more hardware (scale out)- Less expensive hardware and more flexible

Page 14: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

RAC Scalability (2)

Adding more hardware (scale out)Pros:- Cheap and flexible- Getting more popular- Oracle RAC is there to help youCons:- More is not always better- Achieving desired scalability is not easy.

Why?

Page 15: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Reason 1

• Shared disk infrastructure, all disk devices have to be accessible from all servers

• Shared buffer cache (with coherency!)

Clients DB Servers Storage

Page 16: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Reason 2

Page 17: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Reason 3

Page 18: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Examples

2 real life RAC scalability examples

• CASTOR Name Server

• PVSS

Page 19: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

CASTOR Name Server

• CASTOR– CERN Advanced STORage manager– Store physics production files and user files

• CASTOR Name Server– Implements hierarchical view of CASTOR name

space– Multithreaded software– Uses Oracle Database for storing files metadata

Page 20: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Stress Test Application

• Multithreaded • Used with up to 40 threads • Each thread loops 5000 times on

– Creating a file– Checking it’s parameters– Changing size of the file

• Test Made:– Single instance vs. 2 nodes RAC– No changes in schema and application code

Page 21: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Result

CNS : Single instance vs RAC

0

100

200

300

400

500

600

1 2 5 7 10 12 14 16 20 25 30 35 40

Threads

ops/

s

Single instance

RAC, 2 instances

Page 22: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Analysis 1/2

Problem:• Contention on CNS_FILE_METADATA table

Change:• Hash partition with local PK index

Result: 10% gain, but still worse than single instance

Page 23: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Analysis (2/2)

• Top event:– enq: TX - row lock contention– Again on the CNS_FILE_METADATA

• Findings:– Application logic causes row lock contention– Table structure reorganization can’t help

• Follow – up– No simple solution– Work in progress now

Page 24: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS II

• Commercial SCADA application – critical for LHC and experiments

• Archiving in Oracle Database

• Out of the box performance:100 “changes” per second

• CERN needs: 150 000 changes per second = 1500 times faster!

Page 25: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

The Tuning Process

1. run the workload, gather ASH/AWR information, 10046…

2. find the top event that slows down the processing

3. understand why time is spent on this event

4. modify client code, database schema, database code, hardware configuration

Page 26: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS Tuning (1/6)

• Shared resource :EVENTS_HISTORY (ELEMENT_ID, VALUE…)

• Each client “measures” input and registers history with a “merge” operation in the EVENTS_HISTORY table

Performance:• 100 “changes” per second

Tableevents_history

trigger on update eventlast… merge (…)

150 Clients DB Servers Storage

Update eventlastval set …

Update eventlastval set …

Tableeventlastval

Page 27: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS Tuning (2/6)

Initial state observation:

• database is waiting on the clients “SQL*Net message from client ”

• Use of a generic library C++/DB• Individual insert (one statement per entry) • Update of a table which keeps “latest state”

through a trigger

Page 28: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS Tuning (3/6)

Changes: • bulk insert to a temporary table with OCCI, then call PL/SQL

to load data into history table

Performance:• 2000 changes per second

Now top event: “db file sequential read ”

System I/O3.73123,951db file parallel write

System I/O5.81191,133log file parallel write

18.8861CPU time

Other37.2212041enq: TX - contention

User I/O42.5613729,242db file sequential read

Wait ClassPercent Total DB TimeTime(s)WaitsEventawrrpt_1_5489_5490.html

Page 29: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS Tuning (4/6)

Changes: • Index usage analysis and reduction• Table structure changes. IOT. • Replacement of merge by insert. • Use of “direct path load” with ETLPerformance:• 16 000 “changes” per second • Now top event: cluster related wait event

test5_rac_node1_8709_8710.html

Cluster8.62198118,454gc current block 2-way

Cluster9.9922824,370gc current grant busy

Cluster11.1372556,818gc current block busy

16.0369CPU time

Cluster31.62672827,883gc buffer busy

Wait Class% Total Call Time

Avg Wait(ms)Time(s)WaitsEvent

Page 30: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS Tuning (5/6)

Changes:• Each “client” receives a unique number.• Partitioned table. • Use of “direct path load” to the partition with ETL Performance:• 150 000 changes per second • Now top event : “freezes ” once upon a while

rate75000_awrrpt_2_872_873.html

Configuration3.6088785,439undo segment extension

System I/O4.5711091,542log file parallel write

5.1123CPU time

Cluster6.4221557,218gc current multi block request

Concurrency27.6818665813row cache lock

Wait Class% Total Call TimeAvgWait(ms)

Time(s)WaitsEvent

Page 31: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS Tuning (6/6)

Problem investigation:• Link between foreground process and ASM processes• Difficult to interpret ASH report, 10046 trace

Problem identification:• ASM space allocation is blocking some operations

Changes: • Space pre-allocation, background task.

Result:• Stable 150 000 “changes” per second.

Page 32: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS Tuning Schema

150 Clients DB Servers Storage

Tableevents_history

PL/SQL: insert /*+ APPEND */

into eventh (… ) partition

PARTITION (1)select …

from temp

Temptable

Tableevents_history

trigger on update eventlast… merge (…)

Tableeventlastval

Update eventlastval set …

Update eventlastval set …

Bulk insert into temp table

Bulk insert into temp table

Page 33: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PVSS Tuning Summary

Conclusion: • from 100 changes per second to 150 000

“changes” per second • 6 nodes RAC (dual CPU, 4GB RAM), 32

disks SATA with FCP link to host• 4 months effort:

– Re-writing of part of the application with changes interface (C++ code)

– Changes of the database code (PL/SQL)– Schema change– Numerous work sessions, joint work with other

CERN IT groups

Page 34: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Scalability Conclusions

• ASM / cluster filesystem / NAS allow– a much easier deployment– way less complexity and human error risk.

• 10g RAC is much easier to tune• RAC can boost your application performance, but

also disclose the weak design points and magnify their impact

• Proper application design is the key to almost linear scalability for a “non-read-only” application

Page 35: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Recommendations

• Understand the extra features• Take advantage of RAC connections

management– Use « services » with resource management.– Load balancing : client side, server side, client +

server side…

• Use connection failover mechanisms: – « Transparent Application Failover » re-

connection, loss of modifications non yet committed.

– « Fast Connection Failover » (10g), notification, client side to be implemented.

– Minimum: re-connection.

Page 36: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

The Message to You

• RAC can scale even for write intensive applications

• As you know, RAC scalability tuning is challenging

• Create RAC-aware applications from the beginning– Application design is key

• The effort is paid back:– With better application performance– With highly available system– With flexibility and lower price for hardware

Page 37: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

References

• “Pro Oracle Database 10g RAC on Linux”Julian Dyke and Steve Shaw

• “Connection management in RAC” James Morle

• "RAC awareness SQL“ pythian.com

• “ETL 10gR2 loading “ Tom Kyte

• CERN IT-DES (CERN IT-DES group web site)

Page 38: CERN - Real Life RAC Scalability Experiences

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Q&A