revealing the secret behind a stress-free database migration
TRANSCRIPT
Revealing the secret behind a stress-free database migration
Arald den BraberPatrick van Tongeren
About us
Avans University of Applied Sciences
Arald den Braber
System Administrator / Developer / [email protected]
Patrick van Tongeren
Functional [email protected]
About our institution
• 3 locations in the Netherlands
• 28.000 students
• 2600 employees
• 20 schools
• Blackboard since 2001
• 7500 active courses
• 3 system Administrators
• 20 e-Learning coaches
• Selfhosted
Bruxelles
Amsterdam
FH56
Our current Blackboard infrastructure
Storage(CIFS)
Load Balancer
http://bb.avans.nl
Private network
Firewall
DZ37 DZ38DZ36DZ35DZ60(BEP)
HTTPS traffic
Trafic to/from database, data storage and between nodes (ActiveMQ)
Database cluster
FH53
What we are going to learn today?
• Options for implementing high availability
• “By failing to prepare, you are preparing to fail.” ― Benjamin Franklin Therefor: good preparations are the key to a successful migration
• Tuning the database server
• A few things you have to know about Blackboards configuration
Goal of our migration
• Replace old hardware from mid 2010
• Change the database passwords
• And……
Goal of our migration
• Risk analysis audit learned us that we had one single point of failure in our Blackboard infrastructure
Our disaster recovery plan and high availability goes something like this…
Our old Blackboard infrastructure
Storage(CIFS)
Load Balancer
http://bb.avans.nl
Private network
Firewall
DZ37 DZ38DZ36DZ35DZ60(BEP)
HTTPS traffic
Trafic to/from database, data storage and between nodes (ActiveMQ)
Database server
FH20 !!
Database server
FH20
How to select your best high availability solution
• Depends on your needs– Required availability– Recovery time (RTO)– Data loss tolerance (RPO)– Cost
• SQL Server 2012 has several options– Database mirroring (obsolete in future versions)– Replication (not designed for high availability)– Backup and restore– Logshipping– AlwaysOn Failover Cluster Instances (FCI)– AlwaysOn Availability Groups
Backup and restore
• Appropriate for disaster recovery, but not for high availability
• Must always be present
• Pros– Simple– Already present
• Cons– Can take long after disaster before up and running
Database backups
Log backups
Database server Backup share
Logshipping
• Uses the transaction log backup
• User-configurable interval
• Pros– Simple– Does not require Enterprise edition (potentially cheaper to implement)– Logshipping operates at the database level
• Cons– Data loss from last backup– Manual failover– No easy way to failback to the primary server
Log shipping Log shipping
Primary database server
Secondary database server
Backup share
• Shared-storage solution
• Pros– Connect to a virtual name– Allows for OS and SQL upgrades
with minimal user impact
• Cons– Failover takes place at a SQL instance level (not database level)– Dependent upon Windows Clustering – can be complex to setup
AlwaysOn Failover Cluster Instances (FCI)
Clients
Databasestorage
ActiveDatabaseServer 1
PassiveDatabaseServer 2
Windows Server Failover Cluster
Primary Data Center Disaster RecoveryData Center
Primary Secondary Secondary
Availability Group
Synchronous/Asynchronous
Fileshare Witness
Synchronous
AlwaysOn Availability Groups
• Supports a failover environment for a discrete set of user databases
• Pros– Connect to a virtual
name– Mirrors at a database
level (not whole instance)– Supports asynchronous (high performance) and synchronous
(high security) communication options
• Cons– Storage– Only available on SQL Server Enterprise Edition– Dependent upon Windows Clustering – can be complex to setup
Summary
High Availability and Disaster Recovery
SQL Server Solution
Potential Data Loss
(RPO)
Potential Recovery Time
(RTO)
Automatic Failover
Readable Secondaries
AlwaysOn Availability Group synchronous-commit
Zero Seconds Yes 0 - 2
AlwaysOn Availability Groupasynchronous-commit
Seconds Minutes No 0 - 4
AlwaysOn Failover Cluster Instance NA Seconds-to-minutes
Yes NA
Log Shipping Minutes Minutes-to-hours
No Not duringa restore
Backup, Copy, Restore Hours Hours-to-days
No Not duringa restore
Our goal
• Primary goals– Redundancy of the database– Reduce downtime to a maximum of 1 hour
• Secondary goals– Reduce overhead for the primary server– Relationship Costs versus benefit
(Don’t invest too much in something we wish we never have to use)
Our solution
AlwaysOn Availability Groups• Servers
– One physical server acting as primary role (384 GB memory) – One virtual server acting as secondary role (8 Gb memory)
• Configuration– Asynchronous commits– No automatic failover– Readable secondary– Backup secondary
Failover
• Steps to perform– Add resources to our secondary server (restart)– Manually Failover (wizard)
• Estimated duration– Normally within 15 minutes
• Advice– Test failover several times– Familiar with steps for failover – Do a failover in production once a year
• Planned failover– Maintenance– Change asynchronous to synchronous commits first
Database best practice
• Memory– Set a fixed maximum memory usage
• MAXDOP setting– To limit the number of processors to use in parallel plan execution
• Patching– Hotfixes for AlwaysOn
• Use backup compression– Reduce the amount of storage– Decrease backup and restore times
Migration
• Preparation– Make a roadmap– Use scripts
• Test– Use test servers– Execute the roadmap several
times– Know the timeframe
• Production– Last test execution on new production environment
Reduced complexity at migration moment
• The complexity of the actual migration was reduced to these simple steps:– “Refresh” data in the new database server– Duplicate data to standby server– Update the Blackboard configuration
• Using a script• Update the bb-config.properties file
• The servers, SQL server configuration and cluster setup where all prepared in advance and ready to be used
DZ35, 36, 37, 38 and 60
FH20
FH53
FH56
Old server
New database cluster
The migration
The actual migration
• As simple as taking all steps in the roadmap
• Performed at home
• Communication using a Whatsapp group
• On schedule
• No surprises
A funny note Be aware – configuration stored at various locations
• While testing we ran into the situation in which– The new database server was logging authentication failures from the
test application server– The test application server successfully started connecting to the old
database server
App server
Old database server
New database server
Login failedPartly continue using
Mmmm If the new server doesn’t want me I will just
continue to use my old partner
Log file
This was caused by Blackboard ‘s hidden configuration
• bbadmin database – BB_INSTANCE: Database server name & passwords
• CMS database– XY_FILE_SYSTEMS: Database server name & passwords
How to avoid this?
• Make sure the application server can’t see the old database
• This can be done by– Tweaking name resolving on the app server so the old database server
name resolves to 127.0.0.1 (localhost). We used this solution during the test phase of our project
– Take the databases offline on the old server after making the final backup
A few final words
• Good preparations help to reduce complexity and as a result stress during the actual moment of a migration
• The roadmap we created during our project can also be used when you need to clone your environment.
Migration summary
• Making a roadmap and testing has been the key to our success. It taught us:– Which steps we had to take– How long these steps take input for the final roadmap– About the secret locations of the Blackboard configuration
• Testing also helps you not to forget important steps
Questions
?? ? ?
?
? ?