benchmarking in virtual desktops for end-to-end...

Benchmarking in Virtual Desktops for End-to-EndPerformance Traceability

Trung Nguyen, Prasad Calyam, Ronny Bazan AntequeraUniversity of Missouri-Columbia

Email: {tqnxbb, rcb553}@mail.missouri.edu; [email protected]

Abstract—There are proven benefits in terms of cost and con-venience in delivering thin-client based virtual desktops, versusthe use of traditional physical computers for end-user computingpurposes. In this paper, we present novel extensions in termsof user interface and methodology to our previously developedVDBench benchmarking toolkit for virtual desktop environmentsthat uses principles of slow-motion benchmarking. We focus onautomation aspects of benchmarking, and describe how we extendthe end-to-end performance traceability for different desktopapplications such as Internet Explorer, Media Player and ExcelSpreadsheets. Our approach prevents invasive modification ofthin-client systems, and allows emulation of user behavior withrealistic workloads. Our user interface design issues are aimed atmanaging workflows between the benchmarking client and server,for easy instrumentation and generation of comprehensive per-formance reports for complex environment setups. In a validationstudy, we deploy the enhanced VDBench toolkit in a real-worldvirtual desktop testbed that hosts applications that render 3Dvisualizations of disaster scenarios for scene understanding andsituational awareness. Through the benchmarking results, weshow how the toolkit provides user QoE assessments involvingreliable video events display under different network healthconditions and computation resource configurations.

Keywords—Virtual Desktops Benchmarking, User Quality ofExperience (QoE), End-to-End Performance, Cloud Computing

I. INTRODUCTION

Thanks to the growing development of computer hardwarein terms of computation and storage capacity, and networkconnectivity, nowadays cloud computing has become theinevitable trend of computing. The primary goal of cloudcomputing is to deliver compute environments to customersas services over the Internet. The compute environments canvary from a virtual machine running productivity software, toa cluster of powerful physical servers performing complex par-allel computing-based simulations [1]. When virtual machinesor desktops are delivered as a service to remote users, wedefine this service as the virtual desktop cloud to which usersremotely connect using remote desktop, SSH, or any of theother supported protocols which are built on top of TCP suchas VNC and RDP, or UDP such as PCoIP.

In the virtual desktop cloud shown in Fig. 1, there is acollection of data centers equipped with rack mounted servers.They are located at different geographic locations, and areconnected with each other via high-speed networks. Eachserver runs several virtual machines managed by a hypervisorprogram. In order to use a virtual desktop in the cloud, remoteusers first have to connect to a web portal or broker serverfor authentication purpose. Subsequently, they have to use avirtual desktop view client application such as VMware ViewClient or RealVNC installed on the client to connect to thepre-allocated virtual machine. Once they have logged into thevirtual desktop, they can use popular pre-installed applicationssuch as Spreadsheet Calculator, Internet Browser, Media Playerand Interactive Visualization.

Fig. 1: Virtual Desktop Cloud System

Virtual desktop cloud has brought many benefits comparedto the traditional desktops amongst which are mobility, costefficiency and security. However, contrary to huge benefitswhich cloud computing possesses as mentioned above, virtualdesktop delivery through a cloud platform presents two mainchallenges in terms of scale and assurance of Quality ofExperience (QoE). More specifically, the challenges are: Howto scale up or down system resources consisting of CPU andmemory resource based on user needs? How to retrieve idleresources to serve better for others and reallocate again whenneeded in order to optimize resource allocation and save costs?How to ensure that QoE or user-perceived performance isgood or at least acceptable for each customer with optimizedresources and cost efficiency to keep them interested in usingservices provided by the cloud?

In this paper, we aim to address a few critical challengesfor which benchmarking in virtual desktops is critical. Themain purpose of benchmarking is to measure user-perceivedperformance or QoE in that environment. Based on mea-surement results, service providers are able to validate andensure that good user QoE will be delivered to their customergroups. In addition, they also can avoid over-provisioningissues which cause allocation of more compute resourcesthan the amount customers need without QoE improvement,in order to optimize costs such as computation and powerresources. Another benefit of virtual desktop benchmarkingis that by using benchmark results, administrators are ableto select an appropriate system and connection or protocolconfiguration to meet specific user group needs. Last butnot least, if system and connection configuration cannot bemodified, then benchmarking can validate whether or not acertain infrastructure configuration is working as expected.

We present novel extensions in terms of user interface andmethodology to our previously developed VDBench bench-marking toolkit [2] for virtual desktop environments that uses

principles of slow-motion benchmarking. We focus on automa-tion aspects of benchmarking, and describe how we extendthe end-to-end performance traceability for different desktopapplications such as Internet Explorer, Media Player and ExcelSpreadsheets. Our approach prevents invasive modification ofthin-client systems, and allows emulation of user behaviorwith realistic workloads. Our user interface design issuesare aimed at managing workflows between the benchmarkingclient and server, for easy instrumentation and generation ofcomprehensive performance reports for complex environmentsetups. In a validation study, we deploy the enhanced VDBenchtoolkit in a real-world virtual desktop testbed that hosts appli-cations that render 3D visualizations of disaster scenarios forscene understanding and situational awareness. Through thebenchmarking results, we show how the toolkit provides userQoE assessments involving reliable video events display underdifferent network health conditions and computation resourceconfigurations.

The remainder of the paper is organized as follows: Sec-tion II lists related works. In Section III, we describe our pre-viously developed VDBench toolkit. Section IV describes ourextensions developed for improving end-to-end traceability.Section V presents a validation study of the effectiveness of ourextensions with a 3D computer vision application. Section VIconcludes the paper.

II. RELATED WORK

Early benchmarking approaches for measuring user-perceived performance have been investigated in [3]. Thefirst approach for benchmarking in virtual desktops is to useconventional benchmarks that have been used to benchmarktraditional desktop systems. However, this approach can onlyvalidate the system or server performance in the cloud. Itcannot measure or validate the user QoE at clients. Anotherapproach is to use client instrumentation. This methodologyinvolves direct instrumentation of the thin-client for loggingdetailed input and output display events on the client. Unfor-tunately, this approach is not easy to realize in practice sincemany thin-client systems are quite complex and proprietary[4]. Even if logging is possible, the benchmark results still donot reflect exactly user QoE due to invasive modification ofthe client system, and inability in determining correlation ofclient-side user events with server-side resource performanceevents.

Network monitoring is another promising approach forbenchmarking in virtual desktops. Unlike client instrumenta-tion which needs to modify the benchmarked system, this ap-proach uses network traces analysis. Here, measures are takenon the latency of operations on a benchmark application, takingplace on the cloud system with those operations occurring atthe clients, and bandwidth consumed during the operations.Network monitoring is able to measure more accurately userQoE related performance. However, network monitoring stillhas some limitations: First, it cannot quantify the amount ofdisplay updates discarded at the highest available bandwidthlevels. The second disadvantage is that network monitoringcannot determine whether the cloud platform transmits theoverall visual display data reliably.

Fortunately, slow-motion benchmarking [3] methodologyhas been developed to resolve those downsides of networkmonitoring method in order to obtain more accurate measure-ment of user-perceived performance in virtual desktops. Here,network packet traces are used to monitor the latency and datatransferred between the client and the server, and application

benchmarks are obtained by introducing delays between theseparate visual components of that benchmark, such as webpages or video frames. The delays ensure that the displayupdates for each component are fully completed on the clientbefore the server begins processing the next one. Other notableworks on virtual desktop benchmarking are presented in [6] -[10]. Our earlier works on VDBench toolkit [2] [5] built uponthe slow-motion benchmarking concept, and have resulted ina framework for automation of virtual desktop benchmarking.

III. VDBENCH OVERVIEW

In this section, we present details of our VDBench Toolkit[2] previously developed as an automation tool for benchmark-ing performance by mimicking user tasks in virtual desktops.

The methodology used by VDBench involves followingsteps: (i) Creating realistic work-flows in order to generatesynthetic system loads, (ii) Running those synthetic systemloads in network health impairments that effect user-perceived‘interactive response times’ (e.g., application launch time, web-page download time), (iii) Allowing correlation of thin-clientuser events with server-side resource performance events byvirtue of ‘marker packets’ as depicted in Fig. 2, (iv) The‘marker packets’ particularly help in the analysis of networktraces to measure and compare thin-client protocols in termsof transmission times, bandwidth utilization and video quality,and (v) Generating resource (CPU, memory, network band-width) utilization profiles of different user applications anduser groups.

Fig. 2: Maker packets inserted into each task or visual componentduring benchmarking

Based on above methodology, the VDBench Toolkit canbe used for benchmarking the performance of: (a) Popularuser applications (Spreadsheet Calculator, Internet Browser,Media Player, Interactive Visualization), (b) TCP/UDP basedthin client protocols (RDP, RGS, PCoIP), (c) Remote user ex-perience (interactive response times, perceived video quality),under a variety of system load and network health conditions.VDBench Toolkit is built with Java technology with an aim torun cross-platforms. For example, it is able to run in Windowsor Unix-like systems.

Before running VDBench Toolkit for benchmarking, firsta testbed has to be designed and configured. For researchpurposes, a testbed is composed of a physical server runninga number of virtual desktop instances, a LDAP server forauthenticating users, a thin-client for connecting to the bench-mark server and a network simulator running Netem tool [11]

Fig. 3: VDBench System Components

for modifying network health conditions. All components areconnected to each other via a local area network as shown inthe Fig. 3.

VDBench Toolkit consists of two main components: theVDBench Client Tool, and the VDBench Server Tool. VD-Bench Client Tool runs on a thin-client to connect to theVDBench Server for receiving benchmark results. On the otherside, VDBench Server Tool runs on the VDBench Serverwaiting for requests from clients, and then starts benchmarkingtests on a pre-configured virtual desktop. Since VDBenchToolkit is designed to do benchmarking in virtual desktopswhich are connected from thin-clients with limitations incompute resources such as CPU and memory, VDBench ServerTool running on the server takes the burden of processingdata or calculating benchmark results before sending themback to the client. On the other hand, the client only has torun a network packet capture tool such as Wireshark [13].Moreover, every time a request from the client is receivedby the server tool, it will run a remote command on thepre-configured virtual desktop. Subsequently, the commandwill launch benchmark applications based on a pre-configuredbenchmarking test sequencer that will run as scripted by scriptfiles shared over a local network.

There are a few VDBench limitations in the earlier versionof the tool in terms of deployment and usability that have beenaddressed in this paper that describes our new extensions. Firstset of extensions needed is to improve the configurability ofthe tool through redesigned Graphical User Interface (GUI) forboth client and server sides. In this set, we also improve theextensibility of the tool with new scripting models and supportmodules for adding new benchmark applications. Second setof extensions relates to the measurement data collection, testreport generation and measurement results analysis documen-tation.

IV. EXTENSIONS FOR END-TO-END TRACEABILITY

In this section, we will describe details of our VDBenchextensions for making virtual desktop benchmarking moreconfigurable and usable.

A. Graphic User Interface Redesign

We redesigned the GUI of VDBench Toolkit with an aimto allow users to easily manage system settings graphicallyversus using a configuration file or manually modifying and

recompiling source code of the toolkit. Also, with new GUIsfor both client and server side, benchmarking client usersand system administrators are able to track down outputsand errors more easily via GUI elements. Second, we haveredesigned output display in the GUI to allow users to moreeasily initiate tests and view results along with log and errortracking information. Test results can also be exported to a.csv file for offline analysis. Other improvements include theoption to allow users to set the number of times for runningbenchmark applications before the calculation of the averageresult more accurately.

The main goal of scripting in VDBench is to simulateuser behaviors for benchmarking automation within virtualdesktops. To realize the scripting capability, the VDBenchtoolkit uses AutoIt [12], which is a Basic-like scripting lan-guage for automating Windows GUI events. By using AutoItand configuration files for specific benchmark applicationssuch as Matlab, Excel and SPSS IBM, we can create systemloads are very close to mimic real user behaviors for effectivemeasurement of user-perceived performance.

In the previous version, there is only one main script filewhich handles all of the benchmark applications. In case oneor more applications need to be added to the system, it is noweasier to modify and add new code in the main script file.In order to make VDBench Toolkit scalable and maintainable,we have proposed, redesigned and implemented a new modelwith module-oriented approach as shown in Fig. 4. Morespecifically, we have separated constants and variables usedin the main script file into different files, and created autility file which contains all shared functions for easier codemaintenance. Moreover, with this model configuration, toolinformation is read from a config.ini file rather than manuallyaltered in the main script file. Every time system administratorswant to update environment configurations and modify profilesfor benchmarking, they can do it with much less effort withthe new model.

Fig. 4: New VDBench Scripting Model

Further, we have developed a USB-based client-tool instal-lation so that it can be more easily deployed in a new networksystem environment. The new setup tested illustration with theUSB-based client installation is shown in Fig. 5. Since we usethin-clients to connect to virtual desktop cloud, network mon-itoring tools such as Wireshark [13] and other software suchas Java Runtime Environment (JRE) to run VDBench ClientTool cannot be installed in the thin-clients due to limitations ofresources and proprietary hardware restrictions. To overcomethose limitations, we use a flash drive that can be plugged in thethin-client so that Wireshark tool and JRE will be installed in a

Fig. 5: Testbed for collecting benchmark results

portable manner. By running VDBench tool in a flash drive, notonly we can resolve problems caused by intrinsic limitationsof thin-clients, but also we can install security token or keyon each flash drive we use for benchmarking. As a result, itbrings more security and convenience to benchmarking workson different thin-client systems.

B. Data Collection and Analysis

Herein, we explain how we have improved data collectionand analysis capabilities in the new VDBench toolkit. Aftercompletion of the benchmarking tests, the results are savedinto a unique timestamped .csv file identified by the date whenthe results are created as shown in Fig. 6. Moreover, since.csv files can be opened by Excel, users can easily organizeand summarize benchmark results and create diagrams basedon that data as shown in Fig. 7. With an aim to affirm thecorrectness of benchmark results obtained by VDBench toolkit,Wireshark can be used for that purpose. Since VDBenchClient Tool uses Wireshark to monitor network traces duringbenchmarking, users are able to take advantage of I/O GraphTool of Wireshark to view benchmark traces. Fig. 8 showsbenchmark traces for Internet Explorer open task. Since a taskloading time is distinguishable through start and stop markerpackets (red dots), to view benchmark traces of that task usersneed to obtain start and stop marker (from the log file or markerfile) which can be used for filters in the I/O Graph utility ofWireshark.

Next, we describe how our VDBench extensions can helpdetermine if user QoE is objectively poor, fair or good undervarious network health conditions. For this, we have used athreshold approach for mapping objective user QoE repre-sented by a pre-populated matrix of benchmark results withsubjective user QoE obtained from virtual desktops users [14].Consequently, the tool is capable of indicating whether QoEfrom user perspective is poor, fair or good under a variety ofnetwork health conditions. We have performed benchmarkingby using the extended toolkit in two scenarios including

Fig. 6: Benchmark results in .csv file opened with Excel

Fig. 7: Benchmark results and graphs under different networkconditions

ideal and degraded network conditions. In this context, weuse I/O Graph of Wireshark tool to confirm correctness ofbenchmark results obtained by the toolkit as shown in Fig. 9and Fig. 10. We are interested in crisp signatures of bench-mark tasks between marker packets as orchestrated in slow-motion benchmarking for good network health conditions,

Fig. 8: I/O Graph of Wireshark view of benchmark traces andmarker packets

Fig. 9: Benchmarking Under Ideal Network Condition (No delay,No packet loss)

and prolonged traces for the same tasks in non-ideal networkconditions. In our experiments, we have collected benchmarkresults or matrices under a perfect network health conditionwith no network delay and no packet loss, which are dominantcomponents which affect user QoE. Those measurements willact as base objective user QoE data for benchmark applicationsas described in Tables I - III. Based on those baseline mea-surements, user QoE for corresponding benchmark applicationcan be determined.

From baseline objective user QoE data, we have collectedbenchmark matrices and corresponding subjective user QoEopinions under variety of network health conditions to de-termine threshold low and threshold high for each value ofthose matrices. Therefore, we have a pair of thresholds i.e.,{threshold low, threshold high} for each task of a specificbenchmarked application. Based on the pair of {threshold low,threshold high}, our enhanced toolkit is able to indicate ob-jective QoE status of metrics such as open time, page loadingtime or bandwidth consumed in terms of poor, fair or gooduser-perceived performance as shown in Fig. 11. As a result,our analysis of the benchmarking results will be more easilycomprehendible at a high-level and in terms of detail metricsfor VDBench toolkit users (i.e., virtual desktop users andinfrastructure/service providers).

V. EXTENSIONS VALIDATION STUDY

In this section, we use the new extended VDBench toolkitin a real-world virtual desktop testbed for validating user QoEin a virtual environment cloud that hosts LIDAR-based appli-cation for rendering 3D visualizations of disaster response sce-narios for scene understanding and situational awareness [16].The application context is that, in disaster scenarios many 2Dvideos from many perspectives are collected by civilians and

Fig. 10: Benchmarking Under Degraded Network Condition (200msdelay, 3 % packet loss)

Fig. 11: Mapping Objective QoE and Subjective QoE based onThreshold low and Threshold high

surveillance cameras at a disaster side. Those videos can beextremely useful for officials and emergency responders whoneed to ascertain the extent of the damage. However, handlingsuch large collections of videos and performing immersive typeof surveillance can be quite a challenge. Viewing perspectivesthrough 3D model of those 2D videos can provide betterobservation or understanding of what is happening at thatdisaster site.

Fig. 12: Dynamic interactive 3D scene visualization system fordisaster scenarios

Fig. 12 is the overview of the 3D model system for disasterscenarios. All 2D videos collected on a disaster site will befed into or stored within one or more data centers of a cloudplatform. Subsequently, this data will need to be processedto construct 3D models or 3D videos. Eventually, those 3Dvideos will be delivered to responders as a service through theInternet. They can then use their own personal devices suchas smart phones, tablets or laptops to view the models. That,in turn, will help them understand the situations happening onthe disaster sites and foster the implementation of effectivecountermeasures rapidly.

Media Player Open Time(s) Video Quality Playback Time(s) Data Transferred(Mb)0.52 +/- 0.3 14.5 +/- 0.1 31 +/- 0.02 451 +/- 0.9

TABLE I: Baseline Objective user QoE for Media Player

Excel Data Open Time(s) Render Time(s)1.02 +/- 0.07 20.81 +/- 0.04

TABLE II: Baseline Objective user QoE for Excel

IE Data Page Load Time(s) Bandwidth Consumed(Mb) Data Transferred(Mb)Open IE 1.19 +/- 0.1 N/A N/A

Text Page 2.2 +/- 0.2 0.04 +/- 0.01 0.16 +/- 0.02Low Resolution Image Page 0.2 +/- 0.005 0.1 +/- 0.03 0.05 +/- 0.005High Resolution Image Page 0.4 +/- 0.005 0.05 +/- 0.01 0.05 +/- 0.005

TABLE III: Baseline Objective user QoE for Internet Explorer

Fig. 13: Bandwidth consumed by Excel, IE and 3D on differentnetwork conditions

However, QoE delivered to responders through the systemshould be validated to determine whether the system workswell or not, that is when benchmarking comes into the play.By using our extended VDBench toolkit, we validated QoE of3D model through the system. More specifically, we comparedpopular applications such as Internet Explorer and Excel,and 3D video playback as shown in Fig. 13. As expected,the 3D video playback application when being consumed bythin-client users in a satisfactory and usable manner, wascorrespondingly shown by VDBench to be consuming mostnetwork bandwidth compared to other applications.

VI. CONCLUSION

In this paper, we described how end-to-end performancetraceability through benchmarking in virtual desktop environ-ments can be achieved. We describe several extensions to ourpreviously developed VDBench toolkit, and showed how weimproved the configurability and usability for different networksettings and application types. We validated our extensionsusing a 3D video playback application that is useful in disasterresponse scenarios in a cloud environment.

Our future work is to integrate machine learning ap-proaches provided by Weka Java library into VDBench so thatwe can build more dynamic user QoE models for assessmentof virtual desktop application performance and user QoE.

ACKNOWLEDGEMENT

This material is also based upon work supported by the Na-tional Science Foundation under award number CNS-1205658and VMware. Any opinions, findings, and conclusions orrecommendations expressed in this publication are those ofthe author(s) and do not necessarily reflect the views of theNational Science Foundation or VMware.

REFERENCES

[1] “Apache Virtual Computing Lab” - https://vcl.apache.org/[2] A. Berryman, P. Calyam, M. Honigford, A. M. Lai. “VDBench: A Bench-

marking Toolkit for Thin-client based Virtual Desktop Environments”.Proc. of IEEE CloudCom, 2010.

[3] J. Nieh, S. Yang, N. Novik, “Measuring thin-client performance usingSlow-motion benchmarking”, ACM Transactions on Computer Systems,Vol. 21, No. 1, Pages 87-115, 2003.

[4] “Wyse Thin Client Products” - http://www.wyse.com[5] Y. Xu, P. Calyam, D. Welling, S. Mohan, A. Berryman, R. Ramnath.

“Human-centric Composite Quality Modeling and Assessment for VirtualDesktop Clouds”. ZTE Communications Journal - Special Issue on QoEModeling and Applications for Multimedia Systems, 2013

[6] J. Rhee, A. Kochut, K. Beaty, “DeskBench: Flexible Virtual DesktopBenchmarking Toolkit”, Proc. of IEEE/IFIP Integrated Management(IM), 2009.

[7] M. Dusi, S. Napolitano, S. Niccolini, S. Longo, “A closer look atthin client connections: statistical application identification for QoEdetection”, IEEE Communications Magazine, Vol. 50, No. 11, pp. 195-202, 2012.

[8] P. Casas, M. Seufert, S. Egger, R. Schatz, “Quality of Experience inRemote Virtual Desktop Services”, Proc. of QCMan, 2013.

[9] N. Tolia, D. Andersen, M. Satyanarayanan, “Quantifying interactive userexperience on thin clients”, IEEE Computer, Vol. 39, No. 3, 2006.

[10] B. Staehle, A. Binzenhofer, D. Schlosser, B. Boder, “Quantifying theInfluence of Network Conditions on the Service Quality Experienced bya Thin Client User”, Proc. of Measuring, Modeling and Evaluation ofComputer and Communication Systems (MMB), 2008.

[11] “The Linux Foundation Netem” -http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

[12] “AutoIT Windows GUI Scripting Framework” -http://www.autoitscript.com

[13] “Wireshark is a network protocol analyzer for Unix and Windows” -http://www.wireshark.org

[14] M. Fiedler, T. Hossfeld, T. Phuoc, “A generic quantitative relationshipbetween quality of experience and quality of service”, IEEE Network,Vol. 24, No. 2, Pages 36-41, 2010.

[15] Dusi, M. Napolitano, S. Niccolini, S. Longo. “A closer look at thin-client connections: statistical application identification for QoE detec-tion”. IEEE Communications Magazine, vol. 50, pp. 195-202, 2012.

[16] G. Bui, P. Calyam, B. Morago, R. Bazan, T. Nguyen, Y. Duan, “LIDAR-based Virtual Environment Study for Disaster Response Scenarios”, Proc.of IEEE/IFIP Integrated Management (IM), 2015.

benchmarking in virtual desktops for end-to-end...

Documents