architecting fail safe data services

65
MICROSOFT CONFIDENTIAL – INTERNA Aaron Bosley Marc Mercuri Architect Sr. Director Solutions Engineering Applied Incubation Architecting FailSafe Data Services ARC303 Presented in 2014

Upload: marc-mercuri

Post on 16-Apr-2017

232 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Aaron Bosley Marc MercuriArchitect Sr. DirectorSolutions Engineering Applied Incubation

Architecting FailSafe Data Services

ARC303

Presented in 2014

Session Objective(s): Describe what failsafe data services are composed of as well as their capabilitiesRecognize and describe the failsafe architectural attributes and patterns as they relate to data services

Session Objectives And Takeaways

FailSafe in 15 Minutes

^ ResiliencyResiliency

End Slide

Netflix is currently unavailable.Try again later.

Microsoft Confidential – Internal Use Only

Microsoft Confidential – Internal Use Only

Microsoft Confidential – Internal Use Only

Show Me theModel!

Microsoft Confidential – Internal Use Only

on resiliency

Cache

and the Sharded Database

SCALE ^

Data Service Design Basics

In the beginning there was…

A million different complex implementations of Remote Procedure Calls (RPC) XML-R

PC SOAP

Java RMI

Jax-RPC

CORBA

REST: Guiding Principles

Simple Discoverable MeaningfulNavigable

But there are some issues:No convention for URL/URI structureLevel of detail varies greatly across implementationsDiscoverability quality varies dramaticallyRetrieval and search support is weak

Deliver Using Open Standards••

••••

OData

URL Convention

s

Structured Data

Formats(JSON/Atom)

DescriptiveOperationsPut, Patch,

etc.

Demo

Accessing data from a Data Service with OData

Example HTTP Request

GET http://services.odata.org/OData/OData.svc/ProductsHTTP/1.1Accept: application/jsonUser-Agent: FiddlerHost: services.odata.org

HTTP Verb

API End Point

Data Format for Response

Example ResponseHTTP/1.1 200 OKCache-Control: no-transform, public, max-age=300, s-maxage=600Content-Length: 750Content-Type: application/json;odata=minimalmetadata;streaming=true;charset=utf-8ETag: "686897696a7c876b7e"Server: Microsoft-IIS/8.0X-Content-Type-Options: nosniffDataServiceVersion: 3.0;X-AspNet-Version: 4.0.30319X-Powered-By: ASP.NETSet-Cookie: ARRAffinity=4b98c8c7832a0f11db30cc5be0c0d64bbd90359f8a87b799460af4623a8a0aaf;Path=/;Domain=services.odata.orgSet-Cookie: WAWebSiteSID=2754cab394374e0c890fdcc569e94616; Path=/; HttpOnlyDate: Fri, 24 Jan 2014 02:52:04 GMT

<!--OData Payload-->

HTTP Code

Caching Capabilities

Response Format

OData Custom Header Tag

Modern Authentication Protocols

OAuth 2.0

OAuth 2.0

WS-Fed, SAML 2.0, OpenID

Connect

OAuth 2.0

Browser

Native app

Server app

Web applicatio

n

Web service

API

Standard, http-based protocols for maximum platform reach

Challenges with OAuth’s evolutionChallenges

• Spec identifies Oauth as a Framework vs. Protocol• 60+ optional aspects

• Commercial implementations delivered at different spec levels• Potential for man in the middle attacks in some scenarios• Vendor specific compensation approaches to deal with gaps /

concerns

The Result• Oauth Provider portability can be a challenge• Understanding of different vendor implementation nuances required• Custom code can be required when supporting different vendors

Task Operation URICreate an Order POST http://api.contoso.com/CreateOrder?OrderID=1Approve an Order POST http://api.contoso.com/ApproveOrder?OrderID=1Delete the Order POST http://api.contoso.com/DeleteOrder?Order=1Cancel the Order POST http://api.contoso.com/CancelOrder?Order=1

Task Operation URICreate an Order POST http://api.contoso.com/Order/1?OrderName=“Contoso

Reorder”Approve the Order PUT http://api.contoso.com/Order/1/approvalDelete the Order DELETE http://api.contoso.com/Order/1Cancel the Order PUT http://api.contoso.com/Order/1/cancellation

Service API DesignThe Basic L0 Approach

The Mature Approach

• Enterprises used to provide data services to a Small Number of Known Users (SNKU)

• The new Paradigm for the Enterprise is serving a Large Number of Unknown Users (LNUU)

• Enterprises need to understand who their customers are and design the data service to support them

The World Is Changing!

Service Design ConsiderationsScenario and Persona Based ModelingFocus on Resources, Not URLsVersioning evolvable API DesignInformation Hiding & SecurityHTTP FeaturesHeaderHTTP Codes

Network BandwidthManagement

Things to Avoid

Verb TunnelingExposing data at only one levelPoor/Incomplete Error Handling

An example of what NOT to doHTTP/1.1 200 OK Server: cloudflare-nginx Date: Mon, 06 Jan 2014 15:22:04 GMT Content-Type: text/html Transfer-Encoding: chunked Connection: keep-alive Set-Cookie: __cfduid=[ommitted]; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.ycombinator.com; HttpOnly Last-Modified: Mon, 06 Jan 2014 13:14:48 GMT Vary: Accept-Encoding Expires: Thu, 04 Jan 2024 13:14:48 GMT Cache-Control: max-age=315352364 Cache-Control: public CF-RAY: [omitted] <html> <head> <link rel="stylesheet" type="text/css" href="/news.css"> <link rel="shortcut icon" href="/favicon.ico"> <title>Hacker News</title> </head> <body> <center> <table border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef"> <tr> <td bgcolor="#ff6600"> <table border="0" cellpadding="0" cellspacing="0" width="100%" style="padding:2px"> <tr> <td style="width:18px;padding-right:4px"> <a href="http://ycombinator.com"> <img src="/y18.gif" width="18" height="18" style="border:1px #ffffff solid;" /> </a> </td> <td style="line-height:12pt; height:10px;"> <span class="pagetop"> <b><a href="/news">Hacker News</a></b> </span> </td> </tr> </table> </td> </tr> <tr style="height:10px"></tr> <tr> <td> Sorry for the downtime. We hope to be back soon. </td> </tr> <tr> <td> <img src="s.gif" height="10" width="0" /> <table width="100%" cellspacing="0" cellpadding="1"> <tr> <td bgcolor="#ff6600"></td> </tr> </table> <br /> </td> </tr> </table> </center> </body> </html>

SharePoint Online3 Different TopologiesOutboundInboundBi-Directional

Key DriversIdentity FederationBCS Connectivity RequirementsSharePoint workloads used

Demo

Accessing a Data Service in SharePoint Online

Implementation

Transient Faults

Handle Transient FaultsRetry logic

Idempotency

Protect calls with timeouts on outbound requests

• Fast retries often fail again. Exponential back-off is useful.• Error codes should provide insight.

• Allows restart of requests which may have partially or fully succeeded.

• Don’t retry on timeouts.• Queue work for slow retry later.

Graceful DegradationCompensating behavior

Last resort

Alternate path

Omission

• Example: Serve stale data from cache or switch to read-only mode.

• Please try again• Never show:

• Example: Return a message that says transaction process, charge confirmation will come later in e-mail.

• Example: Browsing items might not include inventory count.

TransientFaultHandlingApplicationBlock

using Microsoft.Practices.TransientFaultHandling; using Microsoft.Practices.EnterpriseLibrary.Common.Configuration; using Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling; ... // Get an instance of the RetryManager class. var retryManager = EnterpriseLibraryContainer.Current.GetInstance<RetryManager>(); // Create a retry policy that uses a retry strategy from the configuration. var retryPolicy = retryManager.GetRetryPolicy <StorageTransientErrorDetectionStrategy>("Incremental Retry Strategy"); // Receive notifications about retries. retryPolicy.Retrying += (sender, args) => { // Log details of the retry. var msg = String.Format("Retry - Count:{0}, Delay:{1}, Exception:{2}", args.CurrentRetryCount, args.Delay, args.LastException);

// Pass msg to your logging handler of choice…. And choose it wisely!

try { // Do some work that may result in a transient fault. var blobs = retryPolicy.ExecuteAction( () => { // Call a method that uses Windows Azure storage and which may // throw a transient exception. this.container.ListBlobs(); }); } catch (Exception) { // All the retries failed. }

<RetryPolicyConfiguration defaultRetryStrategy="Fixed Interval Retry Strategy" defaultSqlConnectionRetryStrategy="Backoff Retry Strategy" defaultSqlCommandRetryStrategy="Incremental Retry Strategy" defaultAzureStorageRetryStrategy="Fixed Interval Retry Strategy" defaultAzureServiceBusRetryStrategy="Fixed Interval Retry Strategy"><incremental name="Incremental Retry Strategy" retryIncrement="00:00:01“ retryInterval="00:00:01" maxRetryCount="10" /> <fixedInterval name="Fixed Interval Retry Strategy" retryInterval="00:00:01" maxRetryCount="10" /> <exponentialBackoff name="Backoff Retry Strategy" minBackoff="00:00:01" maxBackoff="00:00:30" deltaBackoff="00:00:10" maxRetryCount="10" fastFirstRetry="false"/> </RetryPolicyConfiguration>

Circuit Breakers

Circuit Breaker PatternUsed in conjunction with timeouts

Always alert

Used to combat slow responses

• Counter based action• Often activates admission control with metering to allow

recovery• Often activates alternative pathway

• Mitigations should have monitored counters too

• Instrument all calls with timers

Circuit Breaker PatternFallbacks

Custom Fallback

Fail Fast

Fail Silent

• Client library can provide an invokeable callback method• Can also use locally available data on API server (cookie or

cache) to generate a fallback response

• When data is required and there’s no good fallback• Negative UX impact, but keeps API healthy

• Return a null value. Useful if the data is optional

Circuit Breaker - Solution

Example

P&P Circuit Breaker Sample Code

MICROSOFT CONF IDENTIAL – INTERNAL ONLY

Throttling

Throttling – ConsiderationsDesign the strategy early on

Perform quickly

Return a specific error code

Can be used together w/ Auto-scalingConsider aggressive auto-scaling if demands grow very quickly

Throttling – When to use

Ensure a system meets SLAHandle burst activityPrevent a single tenant from monopolizingHelp cost-optimize a system by limiting the maximum resource levelsCombine with auto-scaling

WebApiContrib Throttling Handlerconfig.MessageHandlers.Add(new ThrottlingHandler(new InMemoryThrottleStore(), id => 60, TimeSpan.FromHours(1)));

config.MessageHandlers.Add(new ThrottlingHandler( new InMemoryThrottleStore(), id => {if (id == "10.0.0.1") { return 5000; } return 60; }, TimeSpan.FromHours(1)));

Allow 60 requests per hour for all users

Allow 60 requests per hour for a given IP

Source: http://blog.maartenballiauw.be/post/2013/05/28/Throttling-ASPNET-Web-API-calls.aspx

WebApiContrib Throttling Handler

public class MyThrottlingHandler : ThrottlingHandler { // ... protected override string GetUserIdentifier(HttpRequestMessage request) { // your user id generation logic here} }}

Override to tailor to your needs

Additional ResourcesDesigning Evolvable Web APIs with ASP.NET, Glenn Block, et alRoy Fielding's Dissertation on RESTRichardson Maturity Model, Martin FowlerRESTful Web APIs, Leonard Richardson, et al

MICROSOFT CONF IDENTIAL – INTERNAL ONLY

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.