Produced by: D. Jonscher Slide 1
Efficient Storage Management and Disaster Recovery in Multi-TB Data Warehouses
Dr. Dirk JonscherMainz29 March 2011
Produced by: D. Jonscher Slide 2
Agenda
1. Introduction Credit Suisse
2. DWH @ Credit Suisse
3. DWH DB and Storage Requirements
4. Platform Architecture
5. Oracle Configuration
6. File System Configuration for Oracle Servers
7. Disaster Recovery
8. Summary
Produced by: D. Jonscher Slide 3
1. Credit Suisse History in Switzerland
1856
155 years ago, Alfred Escher founded Schweizerische Kreditanstalt (SKA) – which later became Credit Suisse – to push forward the expansion of the railway network and the industrialization of Switzerland.
No Swiss statesman had such a profound impact on the country as Alfred Escher; through his innovations, he laid the foundations for a modern Switzerland.
Alfred Escher (1819-1882) Paradeplatz, Zurich The monument of Alfred Escher at ETH Zurich, founded in 1854Zurich main station
Produced by: D. Jonscher Slide 4
1786
Over 200 years ago, the founder of the Massachusetts Bank – forerunner of First Boston – financed the first US ship to China. The First Boston Corporation was created as the investment banking arm of the First National Bank of Boston in 1932 and went public in 1934.
State Street 1801: State Street, Boston, 1801. On the right stood the American Coffee House, home of the Massachusetts Bank, 1792 – 1809.
1. Credit Suisse History – First Boston Corporation
Produced by: D. Jonscher Slide 5
1. Credit Suisse AG Today – Key Facts
Global bank headquartered in Zurich, serving clients in private banking, investment banking and asset management.
Registered shares of Credit Suisse Group AG (CSGN) are listed in Switzerland (SWX) and as American Depositary Shares (CS) in New York (NYSE).
Total number of employees: about 50'0000
Produced by: D. Jonscher Slide 6
2. Introduction: CS Application Platforms
Mainframe TP/Batch
XBS
WS
80....
JAPD
irect
net
Fron
tnet
....
Data Warehouse
Bas
el II
AM
L
....
DAP
Hea
lthw
ise
CA
PS ....
Integration
Infrastructure
(Managed Interfaces for Services, Events, Bulk-Data, Workflow)intra-AP
cross-AP
... ?(e.g. ERP, ECM,Grid, C/C++)
zOS RTP Solaris RTP Solaris RTP Windows
Produced by: D. Jonscher Slide 7
2. Introduction: Scope of Application Platforms
Services provided by an Application Platform– Platform Product Mgmt & Governance:
drives product development and release & life cycle management adhering to a well-defined governance model
– Platform Operations:operates applications cost-efficiently with standardized processes according to OLA
– Application Development Support: guides projects through entire development process and shields projects from low-level infrastructure issues
Infrastructure needed to provide these services– Technical Components: providers supply high-
quality and well-managed technical components that are tested and integrated into readily deployable packages
– Hosting Infrastructure: applications are hosted on shared hardware resources (servers, storage systems, backup, etc.)
Architecture, Guidelines & Documentation: Defined, standardized architecture based on open standards for various needs, and information to implement applications on the platform
Text
Text
Text
Text
Text
Text
Managed, high-quality Technical Components
Automated, integratedTool-chain
Hosting on Shared HWResources
Architecture, Guidelines &
Documentation
Produced by: D. Jonscher Slide 8
2. Introduction: Characterization DWH AP
Data Warehouse AP
P cost-effective platform for integrating data from multiple internal and external sources and for developing, deploying and operating applications that implement reporting, analysis and data mining functions
P result of rearchitecture program (RAP) DWH (1998 - 2001)
Scope
P reporting and analysis applications
P data from last end-of-day processing (no operational/transactional reporting), historized data
P no direct initiation of business transactions
Functions
P standard- and ad-hoc reporting
P On-Line Analytical Processing
P data mining only in special areas(Customer Relationship Management, CRM and anti-money laundering)
Produced by: D. Jonscher Slide 9
2. Introduction: DWH Reference Architecture
logic;extract, transform, load
logic (no ETL)Legend:
GUI
…
Metadata
Management
Reusable Selection,
Aggregation,Calculation
Web/App Servers
Integration, Historization
Land
ing
Zone
Reporting,OLAP,
Data Mining
Selection,Aggregation,Calculation
dataflow
Data Sources
Sta
ging
Are
a
Sub
ject
M
atte
r A
reas
Reu
sabl
e M
easu
res
&
Dim
ensi
ons
Are
as
relationaldatabase
multidimensionaldatabase file
Layer:Data
IntegrationData
EnrichmentAnalysis
DWH Data Universe Data Marts AnalysisServices Presentation Front
End
Produced by: D. Jonscher Slide 10
2. Introduction: Overview DWH-Tools (AR5)
Metadata SecurityManagement Administration
Windows
DataSources(usuallyHost)
DB2
IMS
Appl.
Appl.
DataSources(usuallyHost)
DB2
IMS
Appl.
Appl.
Reporting,OLAP,
Data Mining
DRP(BO XI
R3)
Clemen-tine15
CS-Appl.(JAP)
GUI
DRP(BO XI
R3)
Exceed
InternetExplorer
6.0
Con
nect
:Dire
ct
MDMS Control-SA3.3 RAT
Reusable Selection,Aggregation,Calculation
Power-Center
8.6(+PL/SQL)
Oracle11g
(RMDA)
Reusable Selection,Aggregation,Calculation
Power-Center
8.6(+PL/SQL)
Oracle11g
(RMDA)
Extraction, Transformation,
Loading
Power-Center
8.6
Oracle11g
(DWH DU)
Selection,Aggregation,Calculation
Oracle11g(DM)
or
MSAS(OLAP)
Power-Center
8.6
Layer: Data Data AnalysisIntegration Enrichment
Control-M
UNIX (Sun/Solaris RTP Solaris)
Produced by: D. Jonscher Slide 11
3. DWH DB and Storage Requirements (2010 - 2013)
Performance and ScalabilityP good performance for very large DB instances (up to 100 TB) up to 10 M (complex) SQL statements per
dayP throughput up to 5 GB/s (DWH Data Universe)P full backup of 100 TB instance in less than 12 hours
Storage ManagementP automated and standardized storage management
covering initial setup, growth, data migration and clean-upsimple and efficient processes to manage growth
P consolidation of spare capacity in one layerP independent migration/switch-over of DB instances possibleP storage-internal copy of data for UAT/IT-refresh possible
Disaster Recovery SolutionP solution needs to be based on a synchronous data replication approach (DWH ETL jobs are not running in
a single transaction and inconsistencies between the batch control view and the Oracle DBMS must be prevented) and must also be usable for "normal" server outages
P no automatic fail-over needed, but site switch needs to be highly automated (scripting)
Produced by: D. Jonscher Slide 12
3. KPIs DWH AP
DWH KPIs DB BackendsP 2 M9000 and 24 M5000 servers (+ dozens of smaller servers for the BI platform)P about 600 TB storage capacity (used, 1 PB attached, 6 PB (virtually) mapped)
about 100 TB in production database instances (full copies in UAT and IT)growth rate: 60% per year (~35 TB per month)
P about 100 applications with 16'000 users, ~2000 feeder files & ~5000 ETL jobs per day
Summary Hardware Technology AR5P server type (DB servers): M5000 and M9000P storage subsystems Hitachi USP-V, 450 GB FC disks, large disk pools (>384), RAID 5 (7/1) & thin prov.P storage SAN: Brocade 4/8 Gbs, 4 ports on M5000 and 16 ports on M9000P backup SAN: ditto, 8 ports on all servers ( shared pool of 16 drives per data center for DWH AP)P backup drives: IBM TS1130 with 1 TB cartridges (native write performance: ~160 MB/s)
Typical Server KPIsP LUNs: up to 2000 per server and 1600 per DB instance (overall about 20'000)P file systems: between 50 and 150 per server (overall about 2000)
Throughput of Current Platform ReleaseP data migration: 5-6 TB/h (M9000)P backup performance: ~1TB/h per tape drive
test for inc0 on DWH Data Universe: 7.5 TB/h on 8 drives (M9000)
Produced by: D. Jonscher Slide 13
Pow
erC
ente
r
Landing Zone
SAS
Mantas
4. Physical Architecture (High-Level)
Staging Server (StS)(PowerCenter)
DWH DU Server (DUS)
DWH Data Mart Server (DMS)
Sou
rce
Sys
tem
s
Data Universe
StA
SM
A
RM
DA
MDMFile
Tr
ansf
er
Pow
erC
ente
r
Load
Job
s
Pow
erC
ente
r
Load
Job
s
xDM##xDM##Data Marts
DM
DM...
file-based landing zone(feeder files)PowerCenter installation(incl. IDQ)hosting of SAS & Mantas
DWH Data Universe is stored in one big Oracle DB instanceStaging Area (StA)Subject Matter Area(s)(SMA)Reusable Measures & Dimensions Areas (RMDA)
multiple data mart serversmultiple data marts share an Oracle DB instance
Pow
erC
ente
r
Landing Zone
SAS
Mantas
Produced by: D. Jonscher Slide 14
4.1 Staging Server
Architecture
P usage of Solaris zones to split PowerCenter processing across data centers- DWH DU loads in one data center- data mart loads in the other data center
P data (feeder files and logs) are mirrored (via Volume Manager)in case of a failure/disaster switch Solaris zone of the failed server to the remote site(manual process)
Data Center 1 Data Center 2
DM zoneDDU zone mirroring via VMEnt
erpr
ise
Sto
rage
S
yste
m
Ent
erpr
ise
Sto
rage
S
yste
mPRODDDU
PRODDM
M5000 large:
6 dual-port Emulex cards for SAN: 4 x storage, 8 x backupEthernet interfaces: 1 quad-port 1 Gb/s card and 1 dual-port 10 Gb/s card
Produced by: D. Jonscher Slide 15
4.2 DWH DU Server
Architecture
P use single-domain M9000-32 on the production site
P split UAT server (also M9000-32) into 2 dynamic domainsdomain 1: stand-by environment (full I/O connectivity already configured)domain 2: UAT environment
P in case of a prod. server failure the stand-by domain on the PT/A server is used as productionboards must be reconfigured from PT/A domain into stand-by domain
Data Center 1 Data Center 2
PRODDUS st
and-
by
PTADUS
do1 do2mirroring via VMEnt
erpr
ise
Sto
rage
S
yste
m
Ent
erpr
ise
Sto
rage
S
yste
m
M9000-32:
12 dual-port Emulex cards for SAN: 16 x storage, 8 x backupEthernet interfaces: 1 quad-port 1 Gb/s card and 1 dual-port 10 Gb/s card
Produced by: D. Jonscher Slide 16
4.3 Data Mart Servers
Architecture
P use single-domain M5000 servers, distributed over both data centers
P 2 production servers in different data centers form a so-called "DM group"these "partner" servers host one (or more) production DB instances and the stand-by DB instances of their "partner" in the remote data center(e.g. DMS A: DB instance X & stand-by of Y; DMS B: DB instance Y & stand-by of X)idle resources minimized
P in case of a production server failure the "partner" server takes over the additional load
Data Center 1 Data Center 2
DM Group
PRODDMS A
PRODDMS B
mirroring via VMEnt
erpr
ise
Sto
rage
S
yste
m
Ent
erpr
ise
Sto
rage
S
yste
m
M5000 large:
6 dual-port Emulex cards for SAN: 4 x storage, 8 x backupEthernet interfaces: 2 quad-port 1 Gb/s card
Produced by: D. Jonscher Slide 17
5. Oracle Configuration
General SetupP single instance, big SGAs (20-64 GB)P based on Veritas Storage Foundation: volume manager (VxVM), data files (VxFS incl. Oracle Disk
Manager), multi-pathing (DMP)P TEMP tablespaces on raw devicesP REDO tablespaces on file systemP each DB instance has a dedicated set of LUNs
required for storage-internal copy of data for UAT- and IT refreshP each instance has its own file systems DB instances can easily be relocated (independent of each other)
Data ReplicationP host-based mirroring of data via VxVM across both CH data centers
dirty region log & fast mirror resynchronizationP site tagging
ensures that mirroring is indeed across both data centers & consistent split (in case of rolling disaster)P active reading on both sides
permanent test that data copy is consistent and usable
Produced by: D. Jonscher Slide 18
6. File System Configuration for Oracle Servers
Veritas File SystemP up to 50 file systems per DB instance DWH DU: 40 for r/w tablespaces and 10 for r/o tablespacesP each file system is striped over 16 LUN big striping units (42 MB = thin provisioning unit of HDS)P prefetched I/O: multiblock read/write for sequential I/O (feeder files, PCenter cache, etc.) 5 x 1MB unitsP Oracle file size up to 64 GB (no "big files") flexible management of file systemsP autoextend unit on Oracle files is 126 MB (3 x 42 MB)
Server-Side ConfigurationP between 250 and 500 TB mapped to each DB server (will cover the maximum growth until next AP release)P simple storage capacity management (autoextend)
only thin provisioning tool needs to be monitored (+ additional storage sub systems if needed)
Application SetupP initial configuration based on expected maximum growth for one year
determines initial number of data files per tablespaceP in the past: initial configuration of tablespaces (fixed size) and subsequent extension
reclaiming of unused capacity in tablespaces rather labor-intensive
Produced by: D. Jonscher Slide 19
7. Disaster Recovery: Switching of Zones
1. Shut-Down (not "needed" in disaster case or if the server is broken)P stop zoneP unmount file system(s) (of this zone) one call of a BCP administration script (takes about 10 min)P deport disk groups
2. RestartP import disk groupsP mount file system one call of a BCP administration script (takes about 10 min)P restart zone
CommentP automated redirect of all connection requests to this zone via Global Site Selector (GSS)
ICMP (ping interval: 120s), DNS caching also only for 120s on all DWH AP servers
Data Center 1 Data Center 2
mirroring via VM
Ent
erpr
ise
Sto
rage
Sys
tem
Ent
erpr
ise
Sto
rage
Sys
tem
zone D
zone E
zone B
zone A
zone B
zone C
}}
Produced by: D. Jonscher Slide 20
7. Switching a DB Instance to another DB Server
Data Center 1 Data Center 2
DB Server Group C(works exactly the same way for any other DB server in both data centers!)
Ent
erpr
ise
Sto
rage
S
yste
m
Ent
erpr
ise
Sto
rage
S
yste
mmirroring via VM
DB DB
1. Shut-Down (not "needed" in disaster case or if the server is broken)stop DB instance & listenerunmount file systems (of this instance) one call of a BCP administration script (takes about 30 min)deport disk groups
2. Restartimport disk groupsmount file systems one call of a BCP administration script (takes about 10 min)restart DB instance & listener
Commentslong running sessions can delay shut-downtypically recovery needed (long update transactions have an impact on restart time)automated redirect of DB connects via Global Site Selector (GSS; monitoring of listener port)
each DB instance has its own DNS entry (!)
}}
Produced by: D. Jonscher Slide 21
8. Summary and Outlook
SummaryP excellent performance and throughput
platform can scale as needed until 2014P thin provisioning works very well, overall storage utilization considerably improved, spare capacity indeed
concentrated in a single layercareful monitoring of thin provisioning pool required
P sharing of large disk groups across different test levels does not cause any production issuesP tiering-in-the-box is much easier to manage than tiered storageP snap technologies (e.g. Hitachi's ShadowImage) work very will (after a few hick-ups)
considerably reduced time to refresh test environments (copy process in the background takes more than 24 hours though)
P host-based mirroring does not cause performance issues (due to distribution of read accesses across both data centers the overall throughput is less impacted than expected about 5-10%)
P DR procedures work very will (components are now switched between both data centers every 3 months)DR procedures will only work if used on a regular basis
Next StepsP further improvements of storage management
automatic reclaim of thin provisioning units (if no longer used)automated restriping when disk groups are extended
P dynamic tiering (next generation of USP-V)P tests with flash technology (on server and/or storage side)