data parallel application development and performance with windows azure
DESCRIPTION
Data Parallel Application Development and Performance with Windows Azure. Advisor : Professor Gagan Agrawal Present by : Yu Zhang . Agenda. Introduction to Windows Azure Parallel Model in Azure Implementation with Queue Implementation with WCF Experimental Evaluation - PowerPoint PPT PresentationTRANSCRIPT
Data Parallel Application Development and Performance with Windows Azure
Advisor : Professor Gagan AgrawalPresent by : Yu Zhang
Agenda
Introduction to Windows AzureParallel Model in AzureImplementation with QueueImplementation with WCFExperimental Evaluation Conclusion
Motivation
Emergency of Cloud Computing • Windows Azure• Amazon EC2• Google App EngineMain Target of Clouds • Changing the way we provision hardware and software for on-demand capacity fulfillment.• Hosting web service• Interest from Scientific Community
Goals
Develop Data Parallel App in Azure is feasible
• How to develop parallel applications on Azure?• What is the resulting performance?
Specific Aim• Simulate MPI reduce and all-reduce on Azure• Build data parallel applications
Introduction to Windows Azure
The same facilities that a desktop OS provides, but on a set of connected servers:
• Abstract execution environment• Shared file system• Resource allocation• Programming environments
Utility computing
• 24/7 operation• Pay for what you use• Simpler, transparent administration
What Is Windows Azure?
It is an operating system for the cloudIt is designed for utility computingIt has four primary features: Write your apps (developer experience) Host your apps (compute) Manage your apps (service management) Store your data (storage)
Windows Azure Components
Windows Azure PaaSApplications Windows Azure Service Model
Runtimes .NET 3.5/4, ASP .NET, PHP
Operating System Windows Server 2008/R2-Compatible OS
Virtualization Windows Azure Hypervisor
Server Microsoft Blades
Database SQL Azure
Storage Windows Azure Storage (Blob, Queue, Table)
Networking Windows Azure-Configured Networking
The Windows Azure Service Model
A Windows Azure application is called a “service”Definition informationConfiguration informationAt least one “role”
Service definition is in ServiceDefinition.csde Defines aspects of a service that cannot be changed without redeployment
Types of roles and static role configurationSet of configuration settings for a roleContract with the environment code runs
The Windows Azure Service Model
Service configuration is in ServiceConfiguration.cscfgDefines values for properties that can be dynamically updated for a running deployment
Values of a configuration parameterNumber of running instances
The Windows Azure Service Model Role Content
Definition: • Role name• Role type• VM size (e.g. small, medium, etc.)• Network endpoints
Code: • Web/Worker Role: Hosted DLL
and other executables• VM Role: VHD
Configuration:• Number of instances• Number of update and fault domains
Desktop And Related Azure Concepts
Desktop
EXEApplication ConfigurationManifestDLL• Windows forms library• Windows serviceLocal data stores
Windows Azure
Service packageService configurationService definitionService role• Web role• Worker roleInternet data stores
Web Role
Storage Services
Public Internet
Web RoleLoad
Balancer
Web Role handles request from the internetIIS7 hosted web core Hosts ASP.NET XML based configuration of IIS7 Integrated managed pipeline Supports SSL
Worker Role No inbound
network connections Can read requests
from queue in storage or through Windows
Communication Foundation
Storage Service
Worker Role
Worker RoleWorker
RoleWorker
Role
Web Role
Windows Azure Storage Abstractions
Blobs – provide a simple interface for storing named files along with metadata for the fileTables – provide structured storage. A table is a set of entities, which contain a set of propertiesQueues – provide reliable storage and delivery of messages for an application
Windows Azure Queues
Queue is highly scalable, available and provide reliable message deliverySimple, asynchronous work dispatchA storage account can create any number of queues8K message size limit and default expiry of 7 daysProgramming semantics ensure that a message must be processed at least
once • Get message to make the message invisible• Delete message to remove the message
Queues Tips
Messages > 8KB => Use blobs or tables to store and message contains the blob or table entity VisibilityTimeout
A queue message will reappear in VisibilityTimeOut (default 30sec)
2 1
C1
C2
1234
Producers Consumers
P2
P1
3 12
Queue Usage Example
MPI programming model
Communicating sequential processesEach process runs in its own local address space.Processes exchange data and synchronize via
message passing. ( Usually, but not always, same code executed by all processes.)
Need to take care of locality, in order to achieve performance – message passing does this explicitly.
Azure Parallel Programming Model
VMS
LBIIS
VMS
Web Role Worker
Role
Queue or WCF
Web role hosts IIS service to accept outside requestWeb role distributes workload to Worker roleWorker roles run and compute simultaneouslyCommunication between roles: Queue or WCF
Simulation of MPI_Reduce in Azure
MPI_Reduce(inbuf, outbuf, count, type, op, root, comm)
Inbuf : address of input buffer Outbuf: address of output buffer Count : number of elements in input buffer Type : datatype of input buffer elements Op : operation Root : process id of root process
While (True){if (queue1.Exists()) { var msg = queue1.GetMessage(); if (msg != null) { DoWork(); queue1.DeleteMessage(msg); }if (queue2.Exists()) { var msg = queue2.GetMessage(); if (msg != null) { DoWork(); queue2.DeleteMessage(msg); }..……if (!queue1.Exists()&&(!queue2.Exists()&&(!queue3.Exists()&&……) { Break; }} Compute ();………………..}
public class WorkerRole : RoleEntryPoint { Public override void Run() { doWork(); var msg = new CloudQueueMessage(); queue.AddMessage(msg); }
Simulation of MPI_ALLReduce in Azure
MPI_Allreduce(inbuf, outbuf, count, type, op, comm)
Inbuf : address of input buffer Outbuf: address of output buffer Count : number of elements in input buffer Type : datatype of input buffer elements Op : operation
While (True){if (queue1.Exists()) { var msg = queue1.GetMessage(); if (msg != null) { DoWork(); queue1.DeleteMessage(msg); }if (queue2.Exists()) { var msg = queue2.GetMessage(); if (msg != null) { DoWork(); queue2.DeleteMessage(msg); }..……if (!queue1.Exists()&&(!queue2.Exists()&&(!queue3.Exists()&&……) { Break; }} Compute (); var msg = new CloudQueueMessage(); queue1. AddMessage(msg); queue2. AddMessage(msg); ………………..………………..}
public class WorkerRole : RoleEntryPoint { Public override void Run() { if (queue.Exists()) { var msg = queue.GetMessage(); if (msg != null) { DoWork(); queue1.DeleteMessage(msg); } doWork(); var msg = new CloudQueueMessage(); queue.AddMessage(msg); }
Matrix Multiplication Each worker role reads the data from matrix BDecouple the matrix A into n parts, n is the number of the worker
roles.Each worker role gets one part of matrix A, for a N×N matrix, each
worker role has two data sets, one is matrix B, the other is part of matrix A, say AK (1≤k≤n) n is the number of worker roles.
Each worker role computes the AK×B and add the result to its queueWeb role performs the reduce operation gets the final result.
Matrix A Matrix B
K Means
1. Web role calculates the initial means2 .Broadcast the k centroids to all worker roles3. Each worker role computes distance of each local document vector to the centroids4. Assign points to closest centroid and compute local MSE (Mean Squared Error)5. Perform reduction for global centroids and global MSE value6. Web role broadcast new cnetroids to all worker role until no points move.
KNN
1. Web role be the master, the other N worker roles are slaves.2. Master divides the training samples to N subsets, and distributes 1 subset
for each worker role. 3. Each individual worker role now computes the distance measures independently and storing the computes measures in a local array4. When each worker role terminates distance calculation, it transmits a message to the web role indicating end of processing5. Web role then notes the end of processing for the sender and acquires the computes measures by reduction.6. After the web role has claimed all distance measures from all WRs, the following steps are performed:• Select top k measures• Sort all distance measures in ascending order• Count the number of classes in the top k measures• The input element’s class will belong to the class having the higher count among top k measures
An Optimatized Solution --- WCF
What is Windows Communication Foundation (WCF)? WCF is Microsoft’s implementation of industry standards to provide a communication subsystem enabling applications on one machine (process boundary) or across multiple machines to communicate. WCF is a core component of the .NET Framework 3.0 and later versions which is included with Windows 7 and Vista platforms as well as the future version of Windows Server. The WCF API unifies ASMX Web Services, .NET Remoting, distributed transactions and messaging into a single programming model service orientation tenable. Fundamental to .NET Framework.
ASMX
WSE
.NET Remotin
g
COM+ (Enterpr
ise Service
s)
MSMQ
WCF
WCF: Address, Binding, Contract
Client Service
Message
Address Binding ContractWhere? How? What?
EndpointABC A B C
EndpointsA B C
WCF Services are deployed, discovered and consumed as endpoints
WCF : Endpoint Contract
All services expose a Contract.WCF uses 5 types of contracts:Service Contract – Exposes the service.Operation Contract- Exposes the service members.Data Contract – Describes service parameters.. <!-- configuration file used by above code --><configuration xmlns="http://schemas.microsoft.com/.NetConfiguration/v2.0"> <system.serviceModel> <services> <!-- service element references the service type --> <service type="MM"> <!-- endpoint element defines the ABC's of the endpoint --> <endpoint address="http://localhost/MM/Ep1" binding="netTCPBinding" contract="IMM"/> </service> </services> </system.serviceModel></configuration>
AddressAn Address uniquely identifies a service.Provides the transport protocol, name of targetmachine (host) and port if applicable.Expressed as an explicit path or URI:[transport]://[machine][:optional port]http://localhost:8081/Servicenet.tcp://localhost:8082/Service
BindingBindings provide “canned” method regarding the transport protocol, message encoding, communicationpattern, reliability, security policies.the WCF features required to support the designgoals of the service. Some common bindings include:BasicHttpBindingNetTcpBindingWSHttpBinding
WCF in Azure
Worker Role [ServiceContract] Public interface IService { [OperationContract] String compute(); } ServiceHost sh = new
ServiceHost(typeof(IService)); //use the AddEndpoint helper method
to create the ServiceEndpoint and add it to the ServiceDescription
sh.AddServiceEndpoint( typeof(IService), //contract type new NetTCPbinding(), //one of the
built-in bindings "http://localhost/IService/Ep1"); //the
endpoint's address
Web RoleNetTcpBinding b = new NetTcpBinding(SecurityMode.None);var facotory= new ChannelFactory<WorkerRole.IService>(b);var channel = facotory.CreateChannel(GetEndpoint( ));channel.compute(); // call the service hosted on worker role
maxBufferSize="10485760" maxReceivedMessageSize="10485760"
From Objects to Services
PolymorphismEncapsulationSubclassing1980s
Interface-basedDynamic LoadingRuntime Metadata
1990s
Object-Oriented
Service-Oriented
Component-Based
Message-basedSchema+ContractBinding via Policy
2000s
C&C++ with MPI
Queue with Azure
WCF with Azure
Experimental Evaluation
8 Pro-cessors4 Pro-
cessors2 pro-cessors
0510152025
MPIQueue
WCF
MPIQueueWCF
MPI Queue WCF8 Processors 0.0993sec 8.8726sec 4.4533sec4 Processors 0.1656sec 13.9872sec 6.349sec2 processors 0.4723sec 20.6536sec 11.5783sec
8 Pro-cessors4 Pro-
cessors2 pro-cessors
0
2
4
6
8
MPIQueue
WCF
MPIQueueWCF
MPI Queue WCF
8 Processors 0.1023 2.8902 1.9234
4 Processors 0.2512 4.1224 3.4267
2 processors 0.5420 7.6238 5.5263
8 Pro-cessors4 Pro-
cessors2 pro-cessors
0123456
MPIQueue
WCF
MPIQueueWCF
MPI Queue WCF8 Processors 0.4272 sec 1.0623 sec 0.8976 sec4 Processors 1.2567 sec 2.3457 sec 1.5214 sec2 processors 2.0233 sec 5.2356 sec 4.1218 sec
Time(sec)
Time(sec)
Time(sec)
Matrix Multiplication Kmeans KNN
Fastest Read: 31ms Slowest Read: 203ms Fastest Write: 31ms Slowest Write: 234ms
Fastest Delete: 0ms Slowest Delete: 593mssimply a reliable method of delivering messages between
processes
QUEUE Performance
Azure VS Traditional Cluster
Hardware
Operating System The OS running on Glenn is Linux which has a lightweight kernel can make full
use of hardware resources.
Programming Language C is only one level of abstraction away from machine language. C# running on
the .Net framework is at a minimum 3 levels of abstraction away from assembler.
CPU Ram BandwidthGlenn 2.7Ghz 8 G 20 Gbps
Azure 1.6Ghz 2 G 10 Gbps
Conclusion
MPI applications can harness the advantages of cloud computingApplications running on the cloud can achieve high efficiency by simulation of MPI parallelization on Windows Azure Platform.Introduce the different inter roles communication methods in parallel way which can be considered as a prototype of Azure MPI Library which most likely will be developed and utilized in the near future.