internship project (lasindu) wso2
TRANSCRIPT
PowerPoint Presentation
WSO2 Stratos - Tenent CPU usage metering
Lasindu Charith Vidana PathiranageUniversity of Moratuwa
Outline
Requirements
Implementation
Demo
Problems
Things Learnt
Q & A
Current implementation for tenant usage metering and billing in Stratos
Number of users
Disk Storage
Bandwidth usage
CPU Usage ???
What are the available CPU usage metering methods ?
Per Hour
Per Instance
Per User
Per Tenant Request ??
Fixed Rate
Outline
Requirements
Implementation
Demo
Problems
Things Learnt
Q & A
Java ThreadMXBean to rescue.. The management interface for the thread system of the JVM.
Java virtual machine implementation supports measuring the CPU time for the current thread or for any thread.
getCurrentThreadCpuTime() Returns the total CPU time for the current thread in nanoseconds.
Does not account for thread sleep/idle time.
Sample Code Usageif (Metering is enabled) { //check the system property (carbon.xml) if(tenant!= carbon.super && context equals services or webapps) {startCpuTime=threadMXBean.getCurrentThreadCpuTime() //get thread cpu time }} //Executable codeif (Metering is enabled) { if(tenant!= carbon.super && context equals services or webapps) {endCpuTime = Get thread Cpu timethreadCpuTime = (endCpuTime startCpuTime)/1000000 //to millisecondsif(threadCpuTime> 0) Add cpuStatisticsEntry to Queue }}
CPU time of requests passing through Tomcat to ODE
Thread Pool
Request
Axis 2Tomcat Valves
Request
Response
Tomcat Servlet Transport
BPEL Component
Apache ODE
SchedulerSimple
Invoke(Job)
Internal Thread Pool
Consumer
ServiceProvider
ClassMediator
ClassMediatorEndpoint
Synapse
Out Sequence
In Sequence
Proxy Service
ServerWorker
ClientWorker
ESB CPU Time
How it works ??
CpuUsageStatistics retrieval ComponentExistingStratos Usage Agent Component
(To be used for Billing and Throttling)Thread Execution Component
CpuUsageStatisticsContainer
CpuUsageStatisticsEntries Queue
RetreiveSendPublishThread CPU Time per Request
Usage< tenantID, measurement, value >BandwidthUsageDatabaseUsageCPUUsage
Usage Agent -> BAM
Publish
Usage Agent Component
Overall Architecture
PublisherUtilsData PersisterData Retriever
Stratos Usage Agent
CarbonStuckThreadDetection ValveTransportStatisticsContainer
ServerWorkerEsbCpuUsageStatisticsContainer
synapse-nhttp-tranport
ClientWorker
SimpleSchedulerBpsCpuUsageStatisticsContainer
ode
New Agent Component
BPS Usage Agent Component
Data Retriever
ESB Usage Agent Component
Data Retriever
ThreadMXBeanCpuUsageStatisticsContainer
Cpu Time Capturing Component
Extensible
Tomcat Ext
Why different ??
Every Request does not go through Tomcat servlet transport (eg: ESB uses nhttp requests)
Some products uses their internal thread pools and thread execution mechanisms. (eg : BPS uses Apache Ode & ESB uses Apache Synapse)
BAM script execution is handled by a separate JVM
Solution
Specifically capture the CPU time for the products which has above constraints.
Separate Component to retrieve product specific CPU Usage Statistics and send them to Stratos Usage Agent Component.
Should add CPU Statistics to the same Usage Agent instance, once it is registered as an OSGI Service.
How to use tenant CPU usage Statistics
Metered CPU Statistics will be summarized in BAM.
Data will be used for billing and throttling.
Tenants will be throttled and billed at the end of the month according to their CPU usage.
Summarized Data in BAM using a Hive Script
Client
Server
Throttling Agent
Usage Agent
Mediator Agent( Optional )
Metering Data Store
Request to Server
How they all fit - in ???
Outline
Requirements
Implementation
Demo
Problems
Things Learnt
Q & A
Demo
CPU time for a
Sample Web-service
BPEL Process
ESB Proxy Service
Remotely debug for correctness
Summarize data at BAM side
Outline
Requirements
Implementation
Demo
Problems
Things Learnt
Q & A
Problems
Products are different
Thread handling is done differently in some products. Had to remotely debug each an every product's dependent apache code (ode/synapse/hive/hadoop) and find the thread execution part and capture the CPU time of each request
Usually tenant information is not associated with each request/response in apache code. I had to send the tenant domain/id in certain cases as a parameter in the invoke method from the particular component or set it as a property so that I could find which request comes from which tenant.
Problems Continued ..
Retrieving data from different dependencies
Cannot add direct dependencies to ode/synapse in Stratos usage agent component since it is not used in every WSO2 product. I had to write new component to do the data retrieval/persistence tasks for each product, where I had to capture CPU time, except for Tomcat.ext
Had to register UsageDataPersistenceManager in usage agent as an OSGI service, so that ESB/BPS components can add the CPU usage data to the same instance that is used by the org.wso2.carbon.usage.agent component's persistence queue.
Problems Continued ..
Accurate CPU Usage data ..??
Request execution live time and CPU time are very close values, but CPU time is less than the live time.
Thread sleep time is not captured as CPU time.
Thread CPU time is aggregated in ThreadMXBean. Had to take the difference of thread CPU time always for a particular request.
Problems Continued ..
Performance Hit ...??
EnableMetring is set to 'false' by default in carbon.xml. CPU time measuring code is executed only if metering is enabled.
Tested for Tomcat.ext after metering is enabled. No noticeable change in SOAPUI for a of web service call burst.
Tested for several types of ESB proxy services with and without code from Apache Jmeter and there is no sign of change in TPS.
Performance Comparison with Apache Jmeter
AverageMedian90% LineMinMaxErrorThroughput
3362260.0%199.7/sec
3362180.0%199.4/sec
3262340.0%199.6/sec
2262320.0%199.3/sec
3262220.0%199.7/sec
AverageMedian90% LineMinMaxErrorThroughput
4372380.0%199.1/sec
3362210.0%199.3/sec
3362380.0%199.6/sec
3362210.0%199.5/sec
2362250.0%199.0/sec
ESB Echo Proxy Service 1000 Samples No of Threads : 100 Ramp-up period : 5s Loop Count : 10
Without Code
With Code
Performance Comparison with Apache Jmeter
AverageMedian90% LineMinMaxErrorThroughput
3342190.0%199.0/sec
3342130.0%198.8/sec
3342210.0%197.5/sec
2342120.0%199.0/sec
2332190.0%199.3/sec
AverageMedian90% LineMinMaxErrorThroughput
4372290.0%196.9/sec
4372220.0%199.3/sec
4372210.0%199.5/sec
3362220.0%198.1/sec
2332170.0%199.2/sec
ESB Echo Proxy Service 1000 Samples No of Threads : 50 Ramp-up period : 5s Loop Count : 20
Without Code
With Code
Performance Comparison with Apache Jmeter
Without CodeWith Code
760.5/sec766.8/sec
762.2/sec749.1/sec
754.1/sec746.8/sec
745.2/sec746.3/sec
751.9/sec751.3/sec
763.4/sec748.5/sec
764.5/sec757.6/sec
753.6/sec749.6/sec
ESB Proxy Service (Class mediator) 1000 Samples No of Threads : 100 Ramp-up period : 1s Loop Count : 10
Checked out for several types of Proxy services at the same time and total throughput seems to be quite even.
Problems Continued ..
Product/version problems
Different Products used different versions of the same component
While project goes on several changes to dependent components occurred
Outline
Requirements
Implementation
Demo
Problems
Things Learnt
Q & A
Automation Hackathon
With GREG team for almost 2 months.
Wrote a lot of test cases and ported old tests to Clarity framework.
Learnt on Greg LCs, Rxts, APIs, URIs, Handlers, Permissions etc.
Learnt to writie axis2 clients to test CRUD Operation support and Discovery Proxy for GREG.
Automated several Support Patches.
Technical Knowledge
Above them all ...
Obviously learnt a load of technical things.
How to take important architectural decisions and flexibility of carbon architecture.
How to Communicate ideas with others and get the necessary help.
Was able to get the help of lot of people and work in several products Carbon, AS, ESB, BPS, GREG, BAM, DSS, Stratos etc.
Learnt best practices in software engineering and coding conventions.
Above them all ...
How to test software, automate the functionality and how QA functions.
How to use mailing lists effectively.
How to manage time and meet deadlines.
How does a company function and how a company prepares for a release.
Got to know a bunch of good friends/people.
Enjoyed every minute of it.
Outline
Requirements
Implementation
Demo
Problems
Things Learnt
Q & A
Questions ..?
References
http://maharachchi.blogspot.com/2011/08/metering-throttling-and-billing-in.html
http://sanjeewamalalgoda.blogspot.com/2011/08/wso2-stratos-usage-and-throttling_22.html
http://wso2.org/library/articles/2011/11/usage-metering-cloud-environment-using-wso2-stratos
http://docs.oracle.com/javase/6/docs/api/java/lang/management/ThreadMXBean.html
http://jmeter.apache.org
Thank you !!