microsoft sql azure - scaling out with sql azure whitepaper

Scaling out with SQL Azure

AuthorDinakar Nethi

Technical ReviewersCihan Biyikoglu, Michael Thomassy

PublishedJune 2010

SummarySQL Azure Database is a cloud database service from Microsoft. SQL Azure provides web-facing database functionality as a utility service. Cloud-based database solutions such as SQL Azure can provide many benefits, including rapid provisioning, cost-effective scalability, high availability, and reduced management overhead. This paper provides an overview on some scale out strategies, challenges with scaling out on-premise and how you can benefit with scaling out with SQL Azure.

Copyright

This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2010 Microsoft Corporation. All rights reserved.

Microsoft, ADO.NET Data Services, Cloud Services, Live Services, .NET Services, SharePoint Services, SQL Azure, SQL Azure Database, SQL Server, SQL Server Express, Sync Framework, Visual Studio, Windows Live, and Windows Server are trademarks of the Microsoft group of companies.

All other trademarks are property of their respective owners.

2

Scaling out with SQL Azure

Partitioning is a technique of splitting up data into smaller subsets across multiple databases for better manageability, availability and scalability. As customers’ data needs grow, they have the option to either scale up or scale out.

The primary advantage of scaling out vs. scaling up is that you can get more work done by parallelizing the workload over distributed servers and improving throughput. Scaling up always has a ceiling as opposed to Scale out with virtually no limits. Depending on how the application is architected, scale out techniques require minimal downtime where you can provision servers while managing the availability of servers through a configuration file. While hot add memory and processor options exist in high end systems, typically they do end up taking downtime as a precaution. Scale out techniques are being successfully applied in several applications such as eBay, Flickr, YouTube, Facebook etc.

Common Partitioning Techniques

Partitioning can be achieved in several ways, the two major categories being Horizontal Partitioning and Vertical Partitioning. While the benefits of partitioning seem well understood, successfully implementing a partitioning technique and achieving the partitioning benefits depends on the nature of workload of businesses and a very carefully planned and tested partitioning key.

Vertical Partitioning: In this technique, data is split across tables with fewer columns through the process of Normalization with infrequently used columns in a separate table. A table is split into multiple smaller tables with fewer columns and can be linked back together by primary key and foreign key relationships. Another way of vertical partitioning is to put a single table/entity into its own database. For example, some tables may be so large that putting them to a different database makes sense. For instance, taking the entire customer table into another database is a form of vertical partitioning. Typically, rows from all tables using unique rowid(s) are re-constituted to form an entire row. For example, if a customer has multiple shipping addresses in his or her profile, the table can be partitioned by putting the primary address in one table and secondary addresses in another table. Most queries will directly be done against the primary address table and very few requiring the application to display all the addresses.

3

PrimaryAddressCustomerIdAddressIdCustomer

CustomerID

SecondaryAddressesCustomerIdAddressId

Horizontal Partitioning: In this technique, data is distributed across tables by a key, similar to Database Partitioning in SQL Server 2005. Typically, there is a master table that maps the data to the partition. Depending on the query that the user submits, the application queries this master table to find out which partition has the required data and routes the query to the particular database. Typically, horizontal partitioning splits data within an instance of a database server. Sometimes this approach is taken further where these partitions are distributed across multiple instances. Each of these partitions can be in a separate database on a physically separate server. The database schema is similar across all partitions.

For example, if you consider a typical online store, all inventories belonging to a particular country can be put into one database. Similarly, sales, customer information belonging can be distributed based on the country, country being the partitioning key here. Another partitioning key in the above scenario could be an alphabetical list of countries. Further partitioning can also be done by using a state as a partitioning key. A combination of criteria can also be considered for a partitioning key, depending on the type of application and the nature of workload.

Typically, data is normalized for better performance. In horizontal partitioning, logical groups of data are stored together. There are some key considerations to achieve the benefits of horizontal database partitioning.

Identifying an efficient partitioning key(s): The real benefits of partitioning – cost savings, faster and efficient data retrieval and better manageability can only be achieved if the partitioning key(s) are chosen properly. This can be different from application to application depending on the nature of business and the type of workload. A deep understanding of the application and data is thus very essential. A natural partitioning key is more beneficial than partitioning based on numerical range of values. For example, if you have a sales application partitioning sales information based on

4

geography can be more efficient than partitioning based on SalesId. Partitioning for Sales data based on date can also be more meaningful since sales data is usually active for current time periods. Older Sales data tends to be more read only and used for reporting purposes.

Partition management: After the partitioning keys have been identified, the next major challenge lies in designing the application layer to manage the partitions. The application needs to have a proper understanding of the partitions and their data distribution. As business grows and data capacity requirements grow, and new partitions are allocated, the application should be able to quickly and seamlessly be able to manage switching in and out partitions with minimal to no downtimes. Depending on how the partitioning keys are chosen, sometimes adding new partitions may involve reallocation of data which may require a downtime.

Avoid cross-database joins: Enforcing referential integrity and joining data across instances are inefficient in partitioning. Hence, the application should be designed in such a way that querying data across partitions is avoided. In cases where data from multiple partitions is required, the application should be smart enough to process the query asynchronously querying the data from each partition separately and aggregating the data in the application layer.

Database partitioning can be very rewarding for businesses with highly variable workloads. Businesses usually evaluate their peak usage patterns and add some room for growth when planning for their annual hardware budgets. This means a lot of unused hardware for highly elastic workloads. With database partitioning, the databases can be distributed over multiple commodity hardware which is usually cheaper than the high end multi-processor machines. As usages increases or decreases, new servers can be provisioned or removed without heavy infrastructure cost investments.

High Availability: With data partitioned across so many databases over multiple instances, having a high availability solution is almost a must.

Administration/Manageability: If you consider scaling out across multiple servers in an on-premise scenario, there can be several challenges with managing all the servers for security updates, server patching, setting up a disaster-recovery plan for each server etc.

5

In this scenario, data is split based on calendar year. As new databases are provisioned, older databases can either be switched out or completely removed without major disruptions to the application.

Application Code

Year=2000

Year=2010

Year=2011

Benefits of Scaling out with SQL Azure

SQL Azure is a cloud based service offering relational database capabilities as part of the Azure platform. SQL Azure is based on Microsoft SQL Server and offers a subset of the features available on on-premise SQL Server. It provides a highly available, scalable multi-tenant database service hosted By Microsoft.

Currently, SQL Azures is offered in two editions – a Web edition and a Business edition. The latest offerings and the database size limits with the offerings is available at Pricing Overview. You can use the SQL Azure portal to create one SQL Azure server, per subscription. The SQL Azure portal provides a user interface that you can use to provision servers and databases.

A SQL Azure server is not a physical server like an on-premise instance of SQL Server. Instead, it’s a logical grouping of user databases. Each SQL Azure server

6

http://www.microsoft.com/windowsazure/pricing/#sql

comes with a read only master database. The master database maintains a list of databases and keeps track of logins, and their permission levels. The master database also contains usage metrics.

Automatic HA: As you scale across instances, ensuring high availability for each of those databases on multiple instances can become quite challenging. The advantage of scaling out with SQL Azure is that each of the databases provisioned comes with automatic failover and high availability. SQL Azure creates three copies of each user database when the database is initially provisioned – either via portal or by CREATE DATABASE command. Each of these copies is created on physically separate servers within the same geo-location as selected on the Azure portal. The SQL Azure service handles the data replication between the copies and ensures they are always in sync at any point of time. In case of hardware failure on one of the nodes, SQL Azure provides automatic failover to one of the redundant copies and this failover is transparent to your application.

Administration with SQL Azure: Scaling out in an on-premise scenario can create considerable management overhead for all the instances of SQL Server involved. SQL Azure abstracts the logical administration from physical administration. Microsoft handles all physical server level tasks such as server patching, hard drives, storage etc. while customers handle logical administration such as administering the databases, logins, users and optimizing the queries and indexes. DBAs only need to manage schema creation, index tuning, query optimization and security administration.

Friction free provisioning: When the need for scalability arises the time it takes to setup the additional databases and prepare them for being used by the applications becomes extremely critical. Because SQL Azure handles all physical server level administration, provisioning a highly available database with SQL Azure is very simple. Applications that need to create a new database to be included in their partition can do so very easily. The database can be created in a matter of seconds and schema and any lookup tables can be updated quickly and available to users. Similarly, any databases that need to be dropped can be done in just few seconds and the master lookup mapping can be updated so applications have a real-time knowledge of available partitions and their mappings.

Applications that need tens or hundreds of databases for a short period of time due to a seasonal/anticipated increase in demand can do so seamlessly and these additional databases can be de-provisioned when the usage drops. This kind of scale out capability with SQL Azure greatly benefits customers from having to purchase high end hardware to sustain these occasional peaks in demand and helps them save infrastructure costs from under-utilized hardware that they would otherwise incur when scaling out on-premise.

Billing with SQL Azure

7

The pricing structure for SQL Azure is available at Azure Pricing. When scaling out with SQL Azure customers can accurately calculate the costs they would incur and also very accurately estimate the cost savings when they need to scale down. Customers can also get real-time cost estimates of the number of databases they have and their usage charges from the portal with very little effort.

Conclusion

Although database partitioning can help improve performance, scalability and costs, it can be a challenging effort to implement successful partitioning scheme. Some applications have a natural partitioning scheme that the applications can take advantage of while some applications may have to be re-architected to become scalable, although the benefits may be well worth it. SQL Azure provides the platform where applications can scale out from one or two databases to tens or hundreds of databases very seamlessly without incurring heavy infrastructure costs. Customers also do not lose any infrastructure investments if they need to scale down due to changing usage patterns.

8

http://www.microsoft.com/windowsazure/pricing/

References:

SQL Azure Portal SQL Azure Developer Center SQL Azure Team Blog

9

http://blogs.msdn.com/SDS

http://msdn.microsoft.com/en-us/sqlserver/dataservices/default.aspx

http://www.microsoft.com/azure/sql.mspx

microsoft sql azure - scaling out with sql azure whitepaper

Technology