cern db services: status, activities, announcements

24

Upload: lewis

Post on 22-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

CERN DB Services: Status, Activities, Announcements. Marcin Blaszczyk - IT-DB. Recap. Last workshop: 16 th Nov 2010 – at that time We were using 10.2.0.4 We were installing new hardware to replace RAC3 & RAC4 RAC8 in “ Safehost ” for standbys RAC9 for integration DBs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CERN DB Services: Status, Activities, Announcements
Page 2: CERN DB Services: Status, Activities, Announcements

CERN DB Services: Status, Activities, Announcements

Replication Technology Evolution for ATLAS Data Workshop, 3rd of June 2014

Marcin Blaszczyk - IT-DB

Page 3: CERN DB Services: Status, Activities, Announcements

3

Recap• Last workshop: 16th Nov 2010 – at that time

• We were using 10.2.0.4 • We were installing new hardware to replace RAC3 & RAC4

• RAC8 in “Safehost” for standbys • RAC9 for integration DBs

• 11.2 evaluation process • 10.2.0.5 upgrade under planning

• Infrastructure for Physics DB Services• Quadcore machines with 16GB of RAM• FC infrastructure for storage (~2500 disks)

Page 4: CERN DB Services: Status, Activities, Announcements

4

Things have changed…• Service evolution

• RAC8 in Safehost for standby installed • Performed in Q3 2010 • To assure geographical separation for DR• New standby installations - for each

production DB

• 10.2.0.5 upgrade• Performed in Q1 2011

Page 5: CERN DB Services: Status, Activities, Announcements

5

Oracle 11gR2• SW upgrade + HW migration

• Target version 11.2.0.3• Performed in Q1 2012

• HW migration• New HW installations (RAC10 & RAC11)• 8 cores (16 threads) CPU, 48GB of memory

• Move from ASM to NAS• Netapp NAS storage

• Replication technology • Usage of streams replication - gradually reduced• Usage of Active Data Guard has grown

Page 6: CERN DB Services: Status, Activities, Announcements

6

Offloading with ADG• Offloading Backups to ADG

• Significantly reduces load on primary• Removes sequential I/O of full backup

• Offloading Queries to ADG• Transactional workload runs on primary• Read-only workload can be moved to ADG• Examples of workload on our ADGs:

• Ad-hoc queries, analytics and long-running reports, parallel queries, unpredictable workload and test queries

• ORA-1555 (snapshot too old)• Sporadic occurrences • Oracle bug – to be confirmed if present in 11.2.0.4

Page 7: CERN DB Services: Status, Activities, Announcements

7

New Architecture with ADG

Primary Database

Active Data Guard for disaster recovery

Active Data Guard for users’ access

2. Busy & critical ADG1. Low load ADG

Active Data Guard for users’ access and for disaster recovery

Primary Database

Maximum performance Maximum performance

Redo Transport

Redo Transport

Redo Transport

• Disaster recovery• Offloading read-only workload

Page 8: CERN DB Services: Status, Activities, Announcements

8

IT-DB Service on 11gR2

• IT-DB service much more stable• Workload has been stabilized

• High loads and node reboots eliminated

• More powerful HW • Offloading to ADG helps a lot• 11g clusterware more stable• Storage model benefited from using NAS

• single/multiple disk failure can’t affect DB service anymore

• Faster and less vulnerable Streams replication

Page 9: CERN DB Services: Status, Activities, Announcements

9

Preparation for Run2 • Oracle SW

• No good solution to fit entire RUN 2• New Software versions:

• 11.2.0.4 vs 12.1.0.1

• New HW• 32 threads CPU, 128/256GB memory

• New Storage NetApp model• More SSD cache• Consolidated storage

Page 10: CERN DB Services: Status, Activities, Announcements

10

Hardware upgrades in Q1 2014• New servers and storage

• Servers: more RAM, more CPU • 128GB of RAM memory (48GB current prod machines)

• Storage: more SSD cache• Newer NetApp model• Consolidated storage

• Refresh cycle of OS and OS related• Puppet & RHEL 6

• Refresh cycle of our HW• New HW for production• Current production HW will be moved to standby

Page 11: CERN DB Services: Status, Activities, Announcements

11

Software upgrades in Q1 2014• Available Oracle releases

• 11.2.0.4• 12.1.0.1

• Evolution – how to balance• Stable services• Latest releases for bug fixes• Newest releases for new features• Fit with LHC schedule

Page 12: CERN DB Services: Status, Activities, Announcements

12

DBAs & workload validation• DBAs - can do:

• Test upgrades of integration and production databases

• Share experience across users communities• Database CAPTURE and REPLAY with RAT testing

• Capture workload from production and replay it in upgraded DB

• Useful to catch bugs and regressions• Unfortunately it cannot cover the edge cases

Page 13: CERN DB Services: Status, Activities, Announcements

13

Validation by the users• Validation by the application owners is very

valuable to reduce risk• Functional tests• Tests with ‘real world’ data sizes• Tests with concurrent workload

• The criticality depends• On the complexity of the application• On how well they can test their SQL

Page 14: CERN DB Services: Status, Activities, Announcements

14

Recent Changes: Q1-Q2 2014• DB services for Experiments/WLCG

• Target version 11.2.0.4• Exceptions - target 12c

• ATLARC• LHCBR• Few more IT-DB services

• Interventions took 2-5 hours of DB downtime• Depending on system complexity: standby

infrastructure, number of nodes etc…

Page 15: CERN DB Services: Status, Activities, Announcements

15

Upgrade technique - overview

Clusterware 11g+

RDBMS 11.2.0.3

Clusterware 12c+

RDBMS 11.2.0.3Redo Transport

DATA GUARD RAC DATABASE

PRIMARY DATABASE RAC

Redo Transport

RW A

ccessRW A

cess

Clusterware 12c+

RDBMS 11.2.0.4

RDBMS upgrade

DATABASE downtime

Upgrade complete!

123456

Page 16: CERN DB Services: Status, Activities, Announcements

16

Phased approach to 12c• Some DBs already on 12.1 version

• ATLARC, LHCBR• Smooth upgrade • No major issues discovered so far

• Following Oracle SW evolution, depending on • Next 12c releases feedback (12.2)• Testing status• Possibility to schedule upgrades

• Next possible slot for upgrades to 12c 1st patchset• Technical stop Q4 2014/Q1 2015?• Candidates: offline DBs (ATLR, CMSR, LCGR…)

Page 17: CERN DB Services: Status, Activities, Announcements

17

Monitoring & Security• Monitoring

• RacMon • EM12c • Strmmon

• Support level during LS1• Best effort

• Security• AuditMon• Firewall rules for external access

• For ADCR in 2013• For ATLR in 2014

Page 18: CERN DB Services: Status, Activities, Announcements

IT-DB Operations Report

ATLAS databases

• Production DBs: 12 nodes*, ~69 TB of data– ATONR: 2 nodes, ~8 TB– ADCR: 4 nodes, ~19,5 TB– ATLR: 3 nodes, ~20.5 TB– ATLARC: 2 nodes, ~17 TB– *ATLAS DASHBOARD (1 node of WLCG database), ~4TB

• Standby DBs: 14 nodes, ~75 TB of data– ATONR_ADG: 2 nodes; ATONR_DG: 2 nodes– ADCR_ADG: 4 nodes; ADCR_DG: 3 nodes– ATLR_DG: 3 nodes

• Integration DBs: 4 nodes, ~18 TB of data– INTR: 2 nodes, ~7,5 TB,– INT8R: 2 nodes, ~9 TB– **ATLASINT: 2 nodes, ~2 TB (will be consolidated with INT8R)

• Nearly 165TB of space, 30 database servers• 12* databases (11 RAC clusters + 1 dedicated RAC node*)

Page 19: CERN DB Services: Status, Activities, Announcements

19

Replication for ATLAS - current status

Page 20: CERN DB Services: Status, Activities, Announcements

20

Replication for ATLAS - plans• Replication changes overview

• PVSS • Read only replica: Active Data Guard

• COOL• Online -> Offline: GoldenGate• Offline ->Tier1s: GoldenGate

• MUON• Streams stopped when ATLAS new solution for custom

data movement will be in place

Page 21: CERN DB Services: Status, Activities, Announcements

21

Conclusions• Focus on stability for DB services • Software evolution

• Critical services has just moved to 11.2.0.4 • Long perspective: keep testing towards 12c

• HW evolution• Technology evolution for replication

• ADG & GG will fully replace Oracle Streams

Page 22: CERN DB Services: Status, Activities, Announcements

22

Acknowledgements• Work presented here on behalf of:

• CERN Database Group

Page 23: CERN DB Services: Status, Activities, Announcements

Replication Technology Evolution for ATLAS Data Workshop, 3 rd of June 2014

Thank [email protected]

Page 24: CERN DB Services: Status, Activities, Announcements

24