distributed systems lab 13

Upload: ferenc-bondar

Post on 04-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Distributed Systems Lab 13

    1/26

    1

    Distributed Systems Techs13. Google, Amazon and Public WSs

    January 11, 2010

  • 7/29/2019 Distributed Systems Lab 13

    2/26

    2

    Google

    the most popular search engines around because itprovides a superior number of hits.

    The search engine is also good at providing validinformation through the use of indexing and filteringso long as you specify the search criteria clearly.

    Given the number of ways that the GoogleAdvanced Search(http://www.google.com/advanced_search) helpsyou look for information, providing clear direction

    can be overwhelming to some. The flexibility provided by the interface is part the

    reason many power users prefer Google.

  • 7/29/2019 Distributed Systems Lab 13

    3/26

    3

    Google Web Services

    mean of accessing Google without going to the Website and performing a search manually.

    This Web service provides essential services byhelping you automate the search process andpresenting data in the form that you need, rather thanin the form that Google thinks you need.

    Client request information based on any of a number ofsearch criteria.

    Google WSs returns the information in a standardized

    format. A Google WS application can make it easy to add a

    professional search service to your site, making it a lot

    more attractive to anyone who visits.

  • 7/29/2019 Distributed Systems Lab 13

    4/26

    4

    Example of received data

    12k

    ... some text highlight) more text ...

    True

    http://www.mwt.net/ DataCon Services

  • 7/29/2019 Distributed Systems Lab 13

    5/26

    5

    Request limitations of Google WS

    according to the license agreement you cant make

    more than 1,000 requests per dayat least, notwithout special permission.

    The request limitation

    ensures the Google servers wont become overloaded, but they also mean you must provide some type of

    monitoring in your application to prevent abuse of thelicensing terms.

    If you violate the licensing terms, Google WS simplydenies your request.

  • 7/29/2019 Distributed Systems Lab 13

    6/26

    6

    Amazon

    allows businesses to "rent" computing power, datastorage and bandwidth on its vast network

    platform. Amazon Web Services (AWS) includes:1. Simple Storage Service (S3),

    2. Elastic Compute Cloud (EC2),3. Simple Queue Service (SQS),4. Flexible Payments Service (FPS), and

    5. SimpleDB to build web-scale business applications. Offers a new paradigm for IT infrastructure:use what you need, as you need it, and pay as you go.

  • 7/29/2019 Distributed Systems Lab 13

    7/26

    7

    Infrastructure in the Cloud

    The Web is full of opportunities for companies both large andsmall, but the smaller companies face a difficult problem:

    infrastructure. Web appls that are popular and have thousands of users require

    significant infrastructure to provide the high performance andsmooth experience that users demand.

    Industrial-strength infrastructure is very expensive to buy andmaintain, so smaller companies with fewer users are often forcedto do without.

    Amazon offers a solution to this dilemma in the form of

    infrastructure WSs: These services allow application developers to avoid altogether

    the burden of buying and maintaining physical infrastructure bymaking it possible to rent virtual infrastructure instead.

  • 7/29/2019 Distributed Systems Lab 13

    8/26

    8

    Amazon Simple Storage Service (S3)

    http://www.aws.amazon.com/s3

    offers secure online storage space for any kind of data, providing

    an alternative to building, maintaining, and backing-up storagesystems.

    It makes your data accessible to any other applications orindividuals you allow from anywhere on the Web.

    There are no limits on how much data you can store in theservice, how long you can store it, or on how much bandwidthyou can use to transfer or publish it.

    S3 is a scalable, distributed system that stores your informationreliably across multiple Amazon data centers, and it is able toserve it quickly to massive audiences.

    S3 storage application programming interface (API) makes noassumptions about the nature of the data you are storing.

    http://www.aws.amazon.com/s3http://www.aws.amazon.com/s3
  • 7/29/2019 Distributed Systems Lab 13

    9/26

    9

    Amazon Elastic Compute Cloud (EC2)

    http://www.aws.amazon.com/ec2

    Makes it possible to run multiple virtual Linux servers on demand,providing as many computers as you need to process your data or run

    your web appl without having to purchase or rent physical machines. Gives full control over each server with root access to the OS, a

    configurable firewall to manage network access, and the freedom toinstall any software you please.

    Once you have set up an EC2 server the way you like it, you can saveit permanently as a server image. You can then launch new serversfrom this image to create virtual machines that are preconfigured andready to do your bidding.

    The service offers an API to start and stop server instances, applyaccess and networking permissions, and manage your server images.

    You manage each individual server using standard Linux tools over a

    secure shell session.

    http://www.aws.amazon.com/ec2http://www.aws.amazon.com/ec2
  • 7/29/2019 Distributed Systems Lab 13

    10/26

    10

    Amazon Simple Queue Service (SQS)

    http://www.aws.amazon.com/sqs

    delivers short messages between any computers or systems withaccess to the Internet, allowing the components of your distributed

    web appls to communicate reliably without you having to build ormaintain your own messaging system.

    you can send an unlimited no. of messages via an unlimited numberof message queues, and you can configure the performance

    characteristics and access permissions for each queue. The service uses a message locking and timeout mechanism that

    helps prevent messages from being delivered more than once, whilestill ensuring they will be delivered despite any component failures or

    network dropouts. Your messages are stored redundantly across multiple servers and

    data centers.

    The service's API allows you to send and receive messages, and to

    control their full life cycle.

    http://www.aws.amazon.com/sqshttp://www.aws.amazon.com/sqs
  • 7/29/2019 Distributed Systems Lab 13

    11/26

    11

    Amazon Flexible Payments Service (FPS)

    http://www.aws.amazon.com/fps Transfers money between individuals or companies that have Amazon

    Payments accounts, allowing you to build applications that provide an onlinestore or that implement a marketplace between customers and third-partyvendors.

    With FPS you can make payments from traditional sources, such as creditcards and bank accounts, or from sources internal to Amazon Paymentsaccounts that have lower fees and are designed to make micro-paymenttransactions feasible.

    All transactions need to be authorized by everyone involved in thetransaction. The parties involved can impose detailed constraints on transactions, such

    as how and when transactions can be performed, how much money can betransferred, and who can send and receive the funds.

    Customers interact with your FPS application through an Amazon Paymentsgateway using their Amazon.com account.

    Because the transactions are mediated by Amazon, your customers are notrequired to provide you with their personal banking information, and you donot have the burden of securely storing this highly sensitive information.

    http://www.aws.amazon.com/fpshttp://www.aws.amazon.com/fps
  • 7/29/2019 Distributed Systems Lab 13

    12/26

    12

    Amazon SimpleDB (SimpleDB)

    http://www.aws.amazon.com/sdb stores small pieces of textual information in a simple database

    structure that is easy to manage, modify and search. If your application relies on a relatively simple database, this

    service can replace your traditional relational database (RDBMS)server leaving you with one less piece of infrastructure topurchase and maintain.

    SimpleDB is designed to minimize the complexity andadministrative overhead involved in managing your data.

    It does not require a pre-defined schema so you can alter thestructure and content of your database whenever you need to.

    It indexes every piece of information you store so all your queriesrun quickly.

    It stores your data securely, redundantly and safely withinAmazon's network of data centers.

    http://www.aws.amazon.com/sdbhttp://www.aws.amazon.com/sdb
  • 7/29/2019 Distributed Systems Lab 13

    13/26

    13

    Characteristics of the 5 Amazon WS

    They are pay-as-you-go, meaning you pay predictable fees based onhow much or how little you use the service.

    There are no initial costs to join, no long-term subscription payments,and the usage fees are low.

    The services are highly scalable, performing equally well in modest ormassively demanding usage scenarios. This means that the applications built on them can be similarly scalable and

    are able to grow rapidly at short notice without hitting limits imposed byinsufficient infrastructure.

    All the services are designed to be highly reliable and fault-tolerant: the services and data resources are distributed across multiple servers and

    data centers within Amazon's infrastructure, and they are managed by a company with significant experience and

    investments in the operation of a global web business.

    To use AWS you first need to register for an account and provide acredit card to be billed for your service usage.

  • 7/29/2019 Distributed Systems Lab 13

    14/26

    14

    APIs: REST for S3 and SQS

    AWS infrastructure services are made available through threeseparate APIs: REST, Query, and SOAP.

    REST interfaces offered by AWS use only the standard componentsof HTTP request messages to represent the API action that is beingperformed.

    These components include:

    1. HTTP method: describes the action the request will perform2. Universal Resource Identifier (URI): path and query elements that

    indicate the resource on which the action will be performed

    3.

    Request Headers: pieces of metadata that provide more informationabout the request itself or the requester

    4. Request Body: the data on which the service will perform an action

  • 7/29/2019 Distributed Systems Lab 13

    15/26

    15

    APIs: Query interfaces for EC2,SQS,FPS&SimpleDB

    Also use the standard components of the HTTP protocol to representAPI actions - however these interfaces use them in a different way.

    Query requests rely on parameters, simple name and value pairs, toexpress both the action the service will perform and the data the actionwill be performed on.

    When you are using a Query interface, the HTTP envelope servesmerely as a way of delivering these parameters to the service.

    To perform an operation with a Query interface, you can express theparameters in the URI of a GET request, or in the body of a POSTrequest.

    The method component of the HTTP request merely indicates where in

    the message the parameters are expressed, while the URI may or maynot indicate a resource to act upon.

    Query interfaces can be considered REST-like, because although theydo things differently, they still only use standard HTTP message

    components to perform operations.

  • 7/29/2019 Distributed Systems Lab 13

    16/26

    16

    APIs: SOAP interfaces for all 5 WS

    Use XML documents to express the action that will be performed and the data thatwill be acted upon.

    These SOAP XML documents are constructed as another layer on top of theunderlying HTTP request, such that all the information about the operation is moved

    out of the HTTP message and encapsulated in the SOAP message instead. For operations performed with a SOAP interface, the HTTP components of the

    request message are nearly irrelevant: all that is important is the XML document sentto the service as the body of the request.

    The valid structure and content of SOAP messages are defined in a WSDL documentthat describes the operations the service can perform, and the structure of the inputand output data documents the service understands.

    To create a client program for a SOAP interface, you will typically use a third-partytool to interpret the WSDL document and generate the client stub code necessary tointeract with the service.

    The approach used in the SOAP interfaces are very different from those used by the

    REST and Query interfaces. Operations expressed in SOAP messages are completely divorced from the underlying HTTP

    message used to transmit the request, and the HTTP message components, such as methodand URI, reveal nothing about the operation being performed.

  • 7/29/2019 Distributed Systems Lab 13

    17/26

    17

    Example of XML doc returned by S3 WS using SOAP

    listing of our data storage buckets

    1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b

    jamesmurty

    oreilly-aws2007-09-14T08:20:49.000Z

    my-bucket2007-09-24T08:39:30.000Z

  • 7/29/2019 Distributed Systems Lab 13

    18/26

    18

    S3

    Data model is very simple, comprising only two kinds of storage resource:objects and buckets. Objects store data and metadata, and

    Buckets are containers that can hold an unlimited number of objects. Provides access control mechanisms that allow you to keep your information

    private or make it public and accessible to anyone on the Internet. Access control settings are configured using a list of rules that describe who will be

    granted access to a resource and the kinds of access that will be permitted.

    Access control settings can be applied to both bucket and object resources.

    Resources are identified using standard URIs. Such as http://s3.amazonaws.com/bucket-name/object-name.

    Allows resources to be accessed using alternative domain names. E.g.http://www.mysite.com/object-name.

    The data is stored redundantly within this architecture, spread across multiple

    physical servers and across multiple data centers in different locations. Drawbacks

    S3 Objects cannot be manipulated like standard files

    Changes take time to propagate

    S3 requests will fail occasionally

    S3's IP addresses may change over time

  • 7/29/2019 Distributed Systems Lab 13

    19/26

    19

    REST interface of S3

    Acting on S3 resources with HTTP methods

    Resource GET HEAD PUT DELETE POST

    S3 Service

    List your

    buckets - - - -

    Object

    Retrieve the

    object's

    data and

    metadata

    Retrieve the

    object's

    metadata

    Create or

    replace

    the object

    Delete the

    object

    Create or

    replace

    the object

    Bucket

    List thebucket's

    objects -

    Create the

    bucket

    Delete the

    bucket -

    Access

    Control

    List -

    ACL (for

    a Bucket

    or Object

    resource)

    Retrieve ACL

    settings -

    Apply new

    ACL

    settings - -

  • 7/29/2019 Distributed Systems Lab 13

    20/26

    20

    S3 Applications

    Can use it as a basic online file repository for backing up files, for web site hosting, as the basisfor a network-mounted filesystem, or as a distribution network.

    Share Large Files. Use the service as a repository for sharing files that are too large to include in an email.

    There are a number of online services already available to do this job, but many charge monthly

    subscription fees if you need to share very large files; with S3 you can do this yourself at little cost. To share a file, you will need to upload the file to S3 and send a URI link to the S3 object in an email.

    Because your files may contain private information, a signed URI link to the object is generated so that onlythe people who receive the link from you can access it. An advantage of using a signed URI is that you can choose how long the link will remain valid.

    S3 Filesystem e.g. with ElasticDrive. S3 offers an unlimited data store on top of which other filesystem interface abstractions can be built.

    Some of these tools are designed to make S3 storage resources accessible to existing network-based toolsthat do not recognize S3 for example, as a FTP or a Web-based Distributed Authoring and Versioning (WebDAV) service

    Others aim to make the storage space in S3 available as a lower-level filesystem resource.

    Mediated Access to S3 e.g. with JetS3t. Effective platform for sharing information, when its simple access control mechanisms meet your needs. Some scenarios are difficult to achieve with ACL settings alone, such as if you wish to make your S3

    storage available to your customers or colleagues to use when they do not have their own AWS account. In such cases you may need to provide your own intermediate service to mediate access to your S3 storage.

    The JetS3t Java library to mediate third-party access to your S3 storage. These tools include a client-side application, for interacting with S3 to upload and download files, and a server-side

    Gatekeeper component that decides whether the client, or user, should be authorized to perform these operations.

  • 7/29/2019 Distributed Systems Lab 13

    21/26

    21

    EC2 key components

    1. Instances.

    are the VMs that run in the EC2 environment and

    perform computing tasks that would typically be done by physicalservers.

    based on a Xen-compatible Linux kernel

    2. Environment.

    Instances run in the EC2 environment, which providesconfigurable access control, contextual data, and otherinformation that instances need to do their work.

    3. Amazon Machine Images(AMIs)

    are files that capture a complete snapshot of an EC2 instance ata point in time, including its software, configuration, andpotentially even its data.

    serve as the boot disk for the instances you launch.

  • 7/29/2019 Distributed Systems Lab 13

    22/26

    22

    EC2 instance types

    Resource Small Large Extra Large

    Platform 32-bit x86 64-bit x86 64-bit x86

    CPU rating

    1 ECU (1 virtual

    core)

    4 ECUs (2 virtual

    cores, 2 ECUs

    each)

    8 ECUs (4 virtual

    cores of 2 ECUs

    each)

    Memory (RAM) 1.7 GB 7.5 GB 15 GB

    Storage(ephemeral) 150 GB

    840 GB (two 420GB partitions)

    1680 GB (four 420GB partitions)

    Storage (root

    partition) 10 GB 10 GB 10 GB

    I/O Performance Moderate High High

    Instance Type

    Name m1.small m1.large m1.xlarge

  • 7/29/2019 Distributed Systems Lab 13

    23/26

    23

    EC2 applications Use the virtual servers provided by EC2 to do most things a physical server can do, from hosting

    web sites or appls to creating clusters of servers for on-demand processing of large data sets.

    Dynamic DNS. How to make your instance accessible via a user-friendly domain name that your users can remember?

    With standard servers: purchase a domain name and configuring the DNS settings for that domain to refer toyour server's IP address. this approach is only really workable if your server has a static IP address that does not change over time.

    EC2 does not allow network addresses to be statically assigned to instances. Start an EC2 instance => VM is assigned IP and DNS addresses that will only refer to the instance for as long as it is running. Use a dynamic DNS service to associate your domain name with your EC2 instance instead of standard DNS.

    Dynamic DNS services are designed for situations in which a server's address changes every so often, and they will propagateaddress changes to the public much more quickly than standard DNS.

    On-Demand VPN Server with OpenVPN. Advantage of EC2: you can start and stop server instances as you need them and only pay for the time the

    server is running. This capability is most often useful for increasing and decreasing the number of servers you have running in response to

    changing demands on a web appl.

    How to set up an EC2 instance to run a Virtual Private Network (VPN) server that you can use to secure yournetwork traffic when you access the Internet over an untrusted network? It is becoming increasingly common for people to access the Internet through public access points, such as WiFi hotspots,

    wired networks provided by hotels, or the internal networks of companies you may be visiting. The best way to protect your data when using an untrusted network is to use a VPN to encrypt it. Open-source VPN server OpenVPN (http://openvpn.net/): Configure the server to use a secret key such that only you, the owner of the key, can connect to it. Once we have configured our instance, it will allow a client computer to connect over a secure channel, and it will relay all

    network traffic to the public Internet on behalf of the client. Whenever you need to access the Internet over an untrusted network, you can fire up this instance and create your own

    personal VPN to protect your network traffic.

    http://openvpn.net/http://openvpn.net/
  • 7/29/2019 Distributed Systems Lab 13

    24/26

    24

    Public available WS -1

    Blogging services:

    MSN Spaces, Akismet, TypePad, FeedBurner, FeedBlitz,Weblogs.com, Technocrati etc

    Bookmark services: del.icio.us, Simpy, Blogmarks, Ma.gnolia etc

    Financial services:

    Blinksale, StrikeIron Historical Stock Quotes, Dun and Bradstreet

    Credit Check, Netaccounts etc Mapping services:

    Google Maps, Yahoo!Maps, AcrWeb, FeedMaps BlogMap,Microsoft MapPoint, MapQuests OpenAPI, Map24 AJAX,

    Microsofts Virtual Earth etc Music/Video Services:

    SeeqPod, Phapsody, Last.fm, YouTube, Dave.TV etc

  • 7/29/2019 Distributed Systems Lab 13

    25/26

    25

    Public available WS - 2

    News/Weather services:

    NewsCloud, NewsIsFree, NewsGator, BBC, WeatherBug etc

    Photo services: Flickr, SmugMug, Pixagogo, Faces.com, Snipshot etc

    References services:

    RealIEDA Reverse Phone Lookup, ISBNdb, Urban Dictionary, SRCDemographics, StrikeIron US Census, StrikeIron Residential Lookup

    Search services:

    Google AJAX Search API, Yahoo!Search, Windows Live Search etc

    Shopping services: Amazon, DataUnison eBay Research, UPC Database, eBay, CNET

  • 7/29/2019 Distributed Systems Lab 13

    26/26

    26

    Public available WS - 3

    English Standard Version Bible Lookup read Bible online

    Amnesty International freedom of expression

    411Sync keyword searches through mobile technologies

    Windows Live Custom Domains manage user base

    Sunlight Labs clerical information (e.g. phones) Food Candy social networking sys for gourmands

    Facebook social networking sys for online contacts

    etc