webrtc quality control in contextual communication systems1236016/fulltext01.pdf · webrtc quality...

IN DEGREE PROJECT ELECTRICAL ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

WebRTC Quality Control in Contextual Communication Systems

WEI WANG

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

WebRTC Quality Control inContextual CommunicationSystems

WEI WANG

Communication SystemsDate: July 26, 2018Examiner: Prof. Gerald Q. Maguire Jr.Supervisor: Assoc. Prof.Anders Västberg & Stefan HellkvistSchool of Electrical Engineering and Computer Science

i

Abstract

Audio and video communication is a universal task with a long historyof technologies. Recent examples of these technologies include Skypevideo calling, Apple’s Face Time, and Google Hangouts. Today, theseservices offer everyday users the ability to have an interactiveconference with both audio and video streams. However, many ofthese solutions depend on extra plugins or applications installing onthe user’s personal computer or mobile device. Some of them alsoare subject to licensing, introducing a huge barrier for developers andrestraining new companies from entering this area. The aim of WebReal-Time Communications (WebRTC) is to provide direct access tomultimedia streams in the browser, thus making it possible to createrich media applications using web technology without the need forplugins or developers needing to pay technology license fees.

Ericsson develops solutions for communication targeting professionaland business users. With the increasing possibilities to gather data (viacloud-based applications) about the quality experienced by users intheir video conferences, new demands are placed on the infrastructureto handle this data. Additionally, there is a question of how the statsshould be utilized to automatically control the quality of service (QoS)in WebRTC communication systems.

The thesis project deployed a WebRTC quality control service withmethods of data processing and modeling to assess the perceived videoquality of the ongoing session, and in further produce appropriateactions to remedy poor quality. Lastly, after evaluated on the Ericssoncontextual test platform, the project verified that two of thestats-parameters (network delay and packet loss percentage) forassessing QoS have the negative effect on the perceived video qualitybut with different influence degree. Moreover, the available bandwidthturned out to be an important factor, which should be added as anadditional stats-parameter to improve the performance of a WebRTCquality control service.

Keywords

WebRTC communication system, QoS assessment, Quality control, DataAnalysis, Modeling

ii

Sammanfattning

Ljud och videokommunikation är en universell uppgift med en långhistoria av teknik. Exempel på dessa teknologier är Skype-videosamtal,Apples ansiktstid och Google Hangouts. Idag erbjuder dessa tjänstervardagliga användare möjligheten att ha en interaktiv konferens medbåde ljud- och videoströmmar. Men många av dessa lösningar berorpå extra plugins eller applikationer som installeras på användarenspersonliga dator eller mobila enhet. Vissa av dem är också föremål förlicensiering, införande av ett stort hinder för utvecklare och att hindranya företag att komma in i detta område. Syftet med Web Real-TimeCommunications (WebRTC) är att ge direkt åtkomst tillmultimediaströmmar i webbläsaren, vilket gör det möjligt att skaparich media-applikationer med webbteknik utan att plugins ellerutvecklare behöver betala licensavgifter för teknik.

Ericsson utvecklar lösningar för kommunikationsriktning förprofessionella och företagsanvändare. Med de ökande möjligheternaatt samla data (via molnbaserade applikationer) om kvaliteten hosanvändare på sina videokonferenser ställs nya krav på infrastrukturenför att hantera dessa data. Dessutom är det fråga om hur statistiken skaanvändas för att automatiskt kontrollera kvaliteten på tjänsten (QoS) iWebRTC-kommunikationssystem.

Avhandlingsprojektet tillämpade en WebRTC-kvalitetskontrolltjänstmed metoder för databehandling och modellering för att bedömaupplevd videokvalitet av den pågående sessionen och vidare produceralämpliga åtgärder för att avhjälpa dålig kvalitet. Slutligen, efterutvärdering på Ericssons kontextuella testplattform, verifieradeprojektet att två av statistikparametrarna (nätverksfördröjning ochpaketförlustprocent) för bedömning av QoS har den negativa effektenpå upplevd videokvalitet men med olika inflytningsgrad. Dessutomvisade den tillgängliga bandbredd att vara en viktig faktor, som börläggas till som en extra statistikparameter för att förbättra prestandaför en WebRTC-kvalitetskontrolltjänst.

Nyckelord

WebRTC-kommunikationssystem, QoS-bedömning, Kvalitetskontroll,Dataanalys, Modellering

iii

Acknowledgements

The degree project was performed at Ericsson in Kista, Sweden.

I would like to thank my Supervisor, Stefan Hellkvist, who instructedme how to start the degree project and provide supports continually. Ialso thank a colleague, Ken Dai, who offered me the use of test platformas well as help from various aspects. Moreover, I appreciate my managerPeter Hammarlund, who helped in the application of the thesis workand follow-on support.

Importantly, I am honored to participate in Ericsson EC3 team, andthanks for all of help from all of the group members, Patrik Oldsberg,Per Boussard, Morteza Araby,etc. Finally, I would like to give aparticular thanks to my examiner, Prof. Gerald Q. Maguire Jr., whogive me suggestions and feedbacks for this thesis.

Stockholm, July 2018Wei Wang

Contents

Abstract i

Abstrakt ii

Acknowledgements iii

Contents v

List of Figures viii

List of Tables x

List of abbreviations xi

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.5 Research Methodology . . . . . . . . . . . . . . . . . . . . 3

1.6 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.7 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . 4

v

vi CONTENTS

2 Background 6

2.1 WebRTC architecture . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 WebRTC Voice and Video Engines . . . . . . . . . 8

2.1.2 Audio CODECs . . . . . . . . . . . . . . . . . . . . 8

2.1.3 Video CODEC . . . . . . . . . . . . . . . . . . . . . 10

2.1.4 Image enhancements . . . . . . . . . . . . . . . . . 10

2.2 Real-Time Transport for a Session . . . . . . . . . . . . . . 10

2.2.1 User Datagram Protocol (UDP) . . . . . . . . . . . 11

2.2.2 Session Traversal Utilities for NAT (STUN) . . . . 11

2.2.3 Traversal Using Relays around NAT (TURN) . . . 11

2.2.4 Interactive Connectivity Establishment (ICE) . . . 12

2.2.5 Session Description Protocol (SDP) . . . . . . . . . 12

2.2.6 Datagram Transport Layer Security (DTLS) . . . . 12

2.2.7 Stream Control Transport Protocol (SCTP) . . . . 13

2.2.8 Secure Real-Time Transport Protocol (SRTP) . . . 13

2.2.9 Real-time Transport Protocol (RTP) . . . . . . . . . 13

2.2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 WebRTC APIs . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.1 RTCPeerConnection . . . . . . . . . . . . . . . . . 14

2.3.2 getUserMedia API . . . . . . . . . . . . . . . . . . 17

2.3.3 WebRTC’s Statistics API . . . . . . . . . . . . . . . 18

2.4 Ericsson’s contextual WebRTC framework . . . . . . . . . 19

2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Implementation and development 23

3.1 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . 23

CONTENTS vii

3.1.1 Data Analysis . . . . . . . . . . . . . . . . . . . . . 24

3.2 Principle Components Analysis . . . . . . . . . . . . . . . 28

3.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.2 Sampling Methodologies . . . . . . . . . . . . . . 40

3.3.3 Classification using Random Forests Algorithm . 42

3.4 Selection of Remedial Actions . . . . . . . . . . . . . . . . 44

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Evaluation of WebRTC test platform 47

4.1 Framework of Evaluation . . . . . . . . . . . . . . . . . . 47

4.2 Interaction Theory . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Software design . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Network Simulation Environment . . . . . . . . . . . . . 52

4.5 Results and Analysis . . . . . . . . . . . . . . . . . . . . . 53

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5 Conclusions and Future work 58

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.4 Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

References 61

A RTCP report figures 68

List of Figures

2.1 WebRTC overall architecture for the website (Adaptedfrom figure in [6]) . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Voice and Video Engine . . . . . . . . . . . . . . . . . . . 8

2.3 WebRTC network protocol stack . . . . . . . . . . . . . . 11

2.4 STUN and TURN . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Calling Sequences: Set up a call . . . . . . . . . . . . . . . 15

2.6 Calling Sequences: Receive a Call . . . . . . . . . . . . . . 16

2.7 WebRTC framework for Ericsson WebRTC services . . . . 20

3.1 statistical RTCP sender report for a video stream from 9March 2018 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 CDF of googRtt . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 CDF and histogram of googAdaptationChanges . . . . . . . 34

3.4 CDF of googAvgEncodeMs . . . . . . . . . . . . . . . . . . 34

3.5 CDF of googEncodeUsagePercent . . . . . . . . . . . . . . . 35

3.6 Histogram of googFirsReceived . . . . . . . . . . . . . . . . 35

3.7 CDF of packetsLost . . . . . . . . . . . . . . . . . . . . . . . 36

3.8 CDF of googNacks . . . . . . . . . . . . . . . . . . . . . . . 36

3.9 CDF of googPlis . . . . . . . . . . . . . . . . . . . . . . . . 37

3.10 Balanced Results . . . . . . . . . . . . . . . . . . . . . . . 42

viii

LIST OF FIGURES ix

3.11 Random Forest Simplified[59] . . . . . . . . . . . . . . . . 43

4.1 Test Platform Framework . . . . . . . . . . . . . . . . . . 48

4.2 Web interface layout . . . . . . . . . . . . . . . . . . . . . 51

4.3 Net Simulator Page . . . . . . . . . . . . . . . . . . . . . . 53

4.4 An test showing the video quality QoS grade and recommendedremedial action . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5 The relationship of the video quality QoS grade and packetloss percentage setting . . . . . . . . . . . . . . . . . . . . 55

4.6 The relationship of the video quality QoS grade and delaysetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.7 The relationship of the video quality QoS grade and bandwidthsetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

A.1 statistical RTCP report for received audio . . . . . . . . . 69

A.2 statistical SSRC report for sent audio . . . . . . . . . . . . 70

A.3 statistical SSRC report for received video . . . . . . . . . 71

A.4 visualizing statistical RTCP report for received audio . . 72

A.5 visualizing statistical RTCP report for sent audio . . . . . 73

A.6 visualizing statistical RTCP report for received video . . 74

List of Tables

2.1 constraints specification . . . . . . . . . . . . . . . . . . . 18

3.1 Parameter specifications of RTCP video-sender report(The parameters prefixed by "goog" signify Google) . . . 27

3.2 Proportion of the Variance Matrix . . . . . . . . . . . . . . 29

3.3 Rotation Matrix . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4 Data quantity of clusters . . . . . . . . . . . . . . . . . . . 33

3.5 Class division on each parameter . . . . . . . . . . . . . . 37

3.6 QoS grade assumption of each cluster . . . . . . . . . . . 38

3.7 Remedies to apply in different situations . . . . . . . . . 45

x

List of abbreviations

app application

CODEC Coder/Decoder

EM Expectation Maximization

FIR Full Intra Request

GMM Gaussian Mixture Model

IETF Internet Engineering Task Force

NACK Negative ACKnowledgemet

P2P Peer-to-peer

PLI Picture Loss Indication

POC Proof-Of-Concept

QoE Quality of Experience

QoS Quality of Service

RTC Real-Time Communication

RTMP Real Time Messaging Protocol

SDP Session Description Protocol

SMOTE Synthetic Minority Over-sampling Technique

SSIM Structural SIMilarity Index

VoIP Voice over IP

xi

xii LIST OF TABLES

W3C World Wide Web Consortium

WebRTC Web Real-time Communication

WWW World Wide Web

Chapter 1

Introduction

This chapter gives a general introduction to the Web Real-TimeCommunications (WebRTC) area. To be specific, this chapter describesspecific problems that this thesis addresses, the context of problems,the goals of the degree project, and outlines the structure of the thesis.

1.1 Background

WebRTC is a standardized technology that provides an easy way toaccess computing device’s (PCs, mobile platforms, IoT devices, etc)media equipment such as a web camera, computer screen, andmicrophone; but also provides ways to transfer media streams acquiredfrom these devices along with any other data (over dedicated datachannels). To put this in perspective, WebRTC [1] is a free, open projectthat provides browsers and mobile applications with Real-TimeCommunications (RTC) capabilities via simple APIs. The WebRTCcomponents have been optimized to best serve this purpose. Chapter2 present a detailed information about the WebRTC.

In times of ever-growing bandwidth needs by Internet users,applications, and increasingly tightened requirements for networkresources provided by network infrastructure providers, the desireto optimize the quality of service(QoS) of applications is becomingmore and more important. Similar to other multimedia applicationsor services, Ericsson WebRTC services focus on enhancing a user’s

1

2 CHAPTER 1. INTRODUCTION

quality of experience (QoE) for a service to the best extent possible.Additional background about contextual communication is given inSection 2.11.

1.2 Problem

During a call (or a conference) there is information available in thebrowser, called webRTC stats, regarding what connections exist to otherpeers, what media coder/decoders (CODECs) are involved, whatbandwidth is consumed and different network characteristics such asround-trip-times and packet loss (for examples of the data one cansee callstats.io [2], a service offering data collection of WebRTC stats).There are many interesting questions when it comes to what specificdata to analyze and how to analyze this data. This degree project seeksto solve problems such as which data features have close relationshipswith perceived quality by conference participants, which in turn leadsto a question: What remedies can be taken to improve the current QoS?

1.3 Purpose

The purposes of this thesis project include doing some research aboutthe current real-time video communication system, deployingalgorithms for detecting an ongoing RTC application, and applyingautomation to increase application-based communication quality. Thisdegree project aims to develop solutions for communication targetingEricsson’s professional/business users. The solutions lie in realizingWebRTC quality control in contextual systems, with a focus on beingaware of the current network environment and taking appropriateaction to remedy deterioration in QoS.

1.4 Goals

The goals of this degree project have been divided into the followingthree sub-goals:

CHAPTER 1. INTRODUCTION 3

• Understand what WebRTC stats-parameters are important togather and how they affect the user’s perceived QoE in theconference.

• Implement a proof-of-concept (POC) that analyzes the data inquestion and tries to predict the perceived QoE – either as acontinuous quality scale or simply as an “OK” or “not OK” output.

• Given the quality assessment, describe what remedies one wouldtake to improve the situation for users that are currently in a call.For example, would it be wise to decrease the video frame rate,completely replace the video with still pictures, or to drop thevideo completely if one would like to, at all cost, try to maintainacceptable audio quality.

Expected deliverables include:

• A report describing what WebRTC data is most important togather as input to an analytic engine when it comes tounderstanding the perceived call/conference quality for individualusers.

• A working POC of an analytic engine that processes the collectedstatistics and predicts what call quality the users are likely toexperience during the duration of the call.

• A description of what actions one could take to improve thesituation for the currently active user. These remedies wouldtake one or more actions to improve the perceived quality and,if feasible, integrate of this “action decision” into the POC. Thedata from the analytic engine could then be fed back to the clientsin order for them to act⇤.

1.5 Research Methodology

Data collection and analysis were the critical research methodologiesin the early stages of this project. I choose Elasticsearch [4] as the basic

⇤Alternatively, the individual nodes could collect data and make decisions basedupon their own data was done in [3]

4 CHAPTER 1. INTRODUCTION

tool. Elasticsearch is a highly scalable open-source full-text search andanalytic engine, which powers applications that have complex searchfeatures and requirements. Compared with other data collecting andanalyzing platforms, Elasticsearch allows me to store, search, andanalyze large volumes of data quickly and in near real time, sincedata are distributed in clusters of multiple nodes and indexed. Usinga RESTful API, I can filter out stats-parameters which are importantto gather by quickly assigning and performing different searches (i.e.queries).

Data visualization is provided by Kibana [5], a GUI to show the datasearch results. Kibana enables visual exploration and real-time analysisof your data in Elasticsearch.

1.6 Delimitations

The important aspect to keep in mind is that we do not envision thatwe control the network, thus all actions taken to improve the situationneed to be done from an “over-the-top” perspective. This means thatthe application does not have control of the underlying network.

After reading and studying previous related works, this degree projectwill not assess the perceived QoS by comparing the sending andreceiving video/radio; instead, it will apply related algorithms toanalyze and classify existing WebRTC statistical history data, in furtherto help invoke remedy actions so as to maintain a given QoS level.

1.7 Structure of the thesis

Chapter 2 presents basic background information regarding WebRTC.Chapter 3 describes the implementation and development done in thisdegree project, gives an introduction of the relevant engineering-relatedand scientific methods, such as analyzing, modeling, developing, andevaluating the related model. Chapter 4 introduces how anexperimental evaluation was carried out by applying the classificationmodel on a WebRTC application as well how the validity of the proposedremedy actions were verified. Chapter 5 states a conclusion of the

CHAPTER 1. INTRODUCTION 5

entire project based upon combining the outcomes achieved throughdata processing, modeling, implementation, and the evaluation of theplatform.

Chapter 2

Background

This chapter provides basic background information regarding WebRTC.To be more specific, this chapter describes the WebRTC architecture,protocols, APIs, statistical parameters, etc. Additionally, this chapterdescribes Ericsson contextual WebRTC framework. Finally, some relatedworks will be discussed.

2.1 WebRTC architecture

There are a number of different standards underlying the WebRTCarchitecture, that combine the browser and application APIs jointlypromoted by the World Wide Web Consortium’s (W3C’s) workinggroup and Internet Engineering Task Force (IETF) working groups.Nevertheless, WebRTC’s fundamental purpose is to empower real-timecommunication between web browsers. In order to be a strong real-timecommunications (RTC) architecture, it is necessary to work acrossmultiple browsers and platforms, thus offering developers the abilityto write rich multimedia applications, without requiring that the userinstall extra plugins.

The overall architecture for the WebRTC website is shown in Figure2.1

6

CHAPTER 2. BACKGROUND 7

Figure 2.1: WebRTC overall architecture for the website (Adapted fromfigure in [6])

As you can see from Figure 2.1, there are three distinct APIs displayingwith different colors (highlighted):

• Web API for the third party developers that provides all the APIsneeded for web-based application development (furtherdescriptions of these Web APIs can be found in Section 2.3)

• WebRTC Native C++ API for browser developers to implementtheir service.

• Overridable API, that browser makers can hook at their arrangement.

Additionally, three other components (Voice Engine, Video Engine andTransport) are shown in Figure 2.1. Section 2.1.1 will discuss the firsttwo of these, while Section 2.2 will discuss real-time transports for asession.

8 CHAPTER 2. BACKGROUND

2.1.1 WebRTC Voice and Video Engines

The WebRTC voice and video engines enable the browser to accessaudio and video streams from the system’s hardware, such asmicrophone and camera. The fully featured WebRTC voice and videoengines are in charge of all the signal processing, as described in Figure2.2. They provide frameworks for audio and video media streams,from the system hardware to the network. Moreover, they exist directlyin the web browser so that a web application (app) can receive the finaloptimized media stream, which can then be transferred to its peer viathe Web APIs.

Figure 2.2: Voice and Video Engine

2.1.2 Audio CODECs

An audio CODEC, also known as sound CODEC, is used for encodingand decoding (as well as potentially compressing/decompressing) thelive stream media into digital data that can sent across the communicationnetwork. Similarly previously digitally encoded media data can alsobe sent via WebRTC.


2.1.2.1 iSAC

The internet Speech Audio CODEC (iSAC [7]) is one of the audio CODECsavailable in WebRTC. It is a wideband and super wideband audioCODEC suitable for voice over IP (VoIP)[8] and streaming audio. Since2011, WebRTC’s code base has contained a royalty-free licenseimplementation of iSAC. Some of the features of iSAC are:

Samplingfrequency

16 kHz (wide-band) or 32 kHz (super wide-band)

Adaptiveand variablebit rate

10 kbit/s to 32 kbit/s (wide-band) or 10 kbit/s to 52kbit/s (super wide-band)

2.1.2.2 iLBC

The Internet Low Bit rate CODEC (iLBC) [9] is narrow-band audiocodec for VoIP and streaming audio. Since 2011, an implementationof iLBC is available under a free license as a part of the open sourceWebRTC project. Some of the features of iLBC are [9, 10]:

Samplingfrequency

8 kHz/16 bit (160 samples for 20 ms frames, 240samples for 30 ms frames)

Fixed bit rate 15.2 kbps for 20ms frames and 13.33 kbps for 30msframes

2.1.2.3 Opus

Opus is a highly versatile, royalty-free audio CODEC. It has the followingproperties [11]:

Samplingfrequency

8 kHz (narrow-band) to 48 kHz (full-band)

Constantand variablebit-rate

6 kbit/s to 510 kbit/s (frame sizes from 2.5 ms to 60ms)


2.1.2.4 Jitter and packet loss concealment

An algorithm is used to hide network jitter and packet loss. The aimof this algorithm is to maintain voice and video quality as high aspossible, while minimizing end-to-end network latency.

2.1.3 Video CODEC

VP8 [12] is royalty-free video CODEC. VP8 is considered appropriatefor interactive real-time communication as it is designed for low latency.Some of its properties are:

Required bandwidth 100 to 2,000+ Kbit/sVariable bit-rate depending on the desired quality of the

streams

2.1.4 Image enhancements

WebRTC’s image enhancements are designed to erase video noise fromthe image captured by a camera.

2.2 Real-Time Transport for a Session

The transport components of WebRTC establish connections over diversenetworks. To support WebRTC applications, the browser needs dozensof protocols or services to negotiate the parameters for each mediastream, realize flow and congestion control, traverse Network AddressTranslation (NATs) nodes and firewalls, offer user data encryption,and so on. This section will introduce the WebRTC protocol stack usedfor transport of both real-time data and one or more data channels.


Figure 2.3: WebRTC network protocol stack

2.2.1 User Datagram Protocol (UDP)

UDP delivers each datagram when it arrives and does not providereliable delivery of data. Due to the time-sensitive characteristic ofreal-time communication, WebRTC priorities timeliness over reliability,hence WebRTC uses UDP as its transport protocol for real-time data.

2.2.2 Session Traversal Utilities for NAT (STUN)

To provide NAT traversal, WebRTC depends upon a STUN [13] serverat a globally routable IP address. In order to discover whether a peer isbehind a NAT and to obtain the IP address and port mapping, STUNpackets should be sent before the peer-to-peer (P2P) WebRTC sessionis initiated. Then the two parties can use the discovered public IPaddress and port to connect with each other.

2.2.3 Traversal Using Relays around NAT (TURN)

TURN [14] is an extension for the STUN protocol and acts as a fallbackwhen STUN fails. TURN requires a relay server to be sufficientlypowerful to shuttle all flows over a bi-direction link simultaneouslyas shown in the upper path in Figure 2.4.


Figure 2.4: STUN and TURN

2.2.4 Interactive Connectivity Establishment (ICE)

ICE, as specified in RFC 5245 [15], is a standard method of NAT traversalused in WebRTC. ICE deals with NATs by performing connectivitychecks. ICE collects all available candidates for NAT traversal, such asIP, reflexive STUN address, and TURN relay address and then sendsthem to the remote peer via the Session Description Protocol (SDP). Assoon as one client has all the collected ICE information about itself andits peer, it initiates connectivity checks, which check for the ability tosend media data via the distinct address. It explores alternatives untilsuccess or it runs out of alternatives.

2.2.5 Session Description Protocol (SDP)

SDP[16] is a data format used for negotiating parameters in a Peer-to-peer(P2P) connection, including network information collected by NATtraversal mechanisms, the data type to be transferred between peers,CODECs to be used.

2.2.6 Datagram Transport Layer Security (DTLS)

DTLS[17] is used to secure all data as encryption is mandatory forWebRTC.


2.2.7 Stream Control Transport Protocol (SCTP)

SCTP[18] is used for implementing WebRTC and delivering data channels.Similar to TCP, SCTP is connection-oriented and provides a flow controlmechanism to ensure the network does not become congested.

2.2.8 Secure Real-Time Transport Protocol (SRTP)

SRTP[19] and its associated control protocol (SRTCP) are two applicationprotocols used to multiplex streams, provide congestion and flow control,and provide delivery of real-time media traffic and other additionalservices on top of UDP. SRTP is a profile for RTP (described in the nextsection).

2.2.9 Real-time Transport Protocol (RTP)

RTP[20] is implemented on top of UDP and is designed for sendingor receiving media traffic. RTP packets include sequence number andtimestamp fields so that the receiver can deal with out of order packetsand jitter.

The Real-time Transport Control Protocol (RTCP) provides a lightweightcontrol mechanism for RTP. RTCP can send statistical reports and flowcontrol messages. RTCP enables the receiver to provide feedback tothe sender so that the sender can perceive the network conditions (asseen by the receiver) and potentially allow the sender to adapt to thecurrent network conditions on the path fron the sender to the receiver.

2.2.10 Summary

The protocol stack introduced in this section is complicated. However,understanding how each protocol works is necessary. Moreover, adeep understanding is needed to determine the actual end-to-endperformance as will be described in later chapters.


2.3 WebRTC APIs

W3C’s WebRTC APIs [21] are designed to allow media data to be sentto and received from the peer browser by deploying the correspondingset of real-time protocols. Developers can access the WebRTC layerthen build peer connections with these functions and objects. TheW3C’s WebRTC APIs specification covers:

• Connecting with remote peers using NAT traversalrelated technologies, for instance, STUN, TURN, and ICE as describedin the preceding section.

• Exchanging locally-produced track information with remote peers.

• Sending arbitrary data to peers.

2.3.1 RTCPeerConnection

The RTCPeerConnection [21, 22] object is used for setting up and creatinga P2P connection then exchanging media stream information. It alsohandles maintenance tasks for session state. This object abstracts allthe inner mechanisms for RTC data transfer. The following figuresshow the sequence of initializing (Figure 2.5) and receiving a call froma remote peer (Figure 2.6).


Figure 2.5: Calling Sequences: Set up a call


Figure 2.6: Calling Sequences: Receive a Call

As shown above, one obvious step should be noted that, before sendingmedia data, both clients need to send SDP offers and responses. Oncethe session is established, the event createoffer is called to generate anoffer compatible with the current session, incorporating any changesbeen made since the last complete offer-answer exchange. This canbe used to add or remove tracks. If no change has been made, theoffer will cover the current local description as well as any additionalcapabilities that could be negotiated here. As an offer, the generatedSDP will contain descriptions of the local media stream tracks attachedto the RTCPeerConnection, the full set of CODEC/RTP/RTCPcapabilities supported by this implementation, parameters of the ICEagent and the DTLS connection. These offers are part of signalingprocess and will be collected in my deployment.


2.3.2 getUserMedia API

The getUserMedia API also known as MediaStream API, performs thefollowing key functions:

• Generates a stream object that represents a real-time video oraudio mediastream

• Deals with selecting input devices among multiple cameras ormicrophones connected to our device

• Provides secure authentication according to a user’s permissionsor preferences asking the user before the browser accesses andfetches a media stream

One of the most important options that will be part of the deploymentis the constraining option of getUserMedia API. You can find the fullset of constraints provided in the expired Internet draft "ResolutionConstraints in Web Real Time Communications" [23]. These optionsinclude the minimum required resolution, frame rate, video aspectratio, and other optional parameters that can be passed from theconfiguration object to getUserMedia API. One example from W3C is[21]:

{mandatory : {

width : { min : 640 } ,height : { min : 480 }

} ,opt iona l : [

{ width : 650 } ,{ width : { min : 650 } } ,{ frameRate : 60 } ,{ width : { max : 800 } } ,{ facingMode : " user " }

]}

This specification allows mandatory and optional constraints for bothminimums and maximums, as shown in Table 2.1.


Table 2.1: constraints specificationObject Specification Value optionsHeight Specifies the video

source heightMin and Max as aninteger

Width Specifies the videosource width

Min and Max as aninteger

FrameRate specify how manyframes to send persecond (usually 60for High Definition(HD), 30 for StandardDefinition (SD))

Min and Max as aninteger

aspectRatio height divided bywidth – usually 4/3or 16/9

Min and Max as adecimal

facingMode Select the front/userfacing camera or therear/environmentfacing camera ifavailable

Which camera tochoose – currentlyuser, environment,left, or right

Generally, setting mandatory constraints is suggested in order to limitbandwidth of the network connection or to save computational powerin the devices. These constraints are incredibly useful since it gives usthe ability to adapt to specific network situations, in order to providethe best available QoS for users.

2.3.3 WebRTC’s Statistics API

Video, audio, and data packets transmitted over an RTCPeerConnectioncan all be lost or experience varying amounts of delay in a real-worldnetwork. The WebRTC application expects to monitor what mediadata are sent via the underlying network and media pipeline. WebRTC’sStatistics API defines a set of objects that give access to the statisticalinformation for an RTCPeerConnection. One can request these valuesvia the getStats() function [24]. Similar to WebRTC related APIs, somefunctions are slightly different in how this API works in different


browsers. For example, getStats() looks as follows:

In Firefox:

peerConnection . g e t S t a t s ( n u l l ) . then ( funct ion ( s t a t s ){ . . . // re turns a promise

In Chrome:

peerConnection . g e t S t a t s ( func t ion ( s t a t s ){ // pass a c a l l b a c k funct ion

Therefore, the monitored data itself also looks slightly different inchrome and firefox, but this will be addressed (and the values normalized)in the implementation that is described later.

In this way, WebRTC applications can observe the media stats parameterswhen performing session negotiation and during data transmission.These statistics play a key role when analyzing the perceived QoS atthe client’s side. Among them, the RTCP report, identified by uniquesynchronization source identifier (SSRC) number, is one of the mostimportant ones and will be described in detail on Section 3.1.1.

2.4 Ericsson’s contextual WebRTC framework

Ericsson WebRTC service is a WebRTC based application (see Figure2.7) that is used for creating a conference with at least two users. Toinitiate a conference, one can open a browser and visit the websiteof the WebRTC service. Once there are two users there will be peerconnections (Section 2.2.1) initiated by WebRTC and then we can callgetStats and post the result to a statistics collector.


Figure 2.7: WebRTC framework for Ericsson WebRTC services

The statistics server receives reports from clients when they do HTTPPOST to it. It is possible to listen to the stats messages posted on thisserver with a web socket connection. The protocol on that web socketinterface is NATS - Open Source Messaging System (https://nats.io).The server has an HTTP endpoint that receives messages when clientsdo HTTP POSTs to the statistics server. This HTTP endpoint serversplits up all the messages in the POST (there are more types of messagesthan just those that WebRTC is posting). The server puts all of thesemessages on a message bus under different topics. This bus is a “NATSbus” in the current implementation. Applications can listen to messageson this bus and take analyzing . There is a websocket endpoint thatconnects to and exposes the bus. Using this endpoint, it was possibleto capture the message posted to the statistics server in real-time andimport the stats from statistics collector into the elasticsearch databasein order to carry out this thesis project.

2.5 Related Work

The existing methods used to assess video quality can be divided intotwo types: subjective quality assessment and objective quality assessment.

There are many objective methods available, but most are unidirectionaland disregard the delay factor for real-time communication. For instance,ITU-T P.1201 [25] describes parametric models, including packet lossfactors, but this standard is a bit outdated. ITU-T P.1203 [26] is moremodern, but only covers loss-free transport (i.e. HTTP), hence it is not


a good fit for WebRTC. Facebook introduced a 360�/VR video qualitymetric using the Structural SIMilarity Index (SSIM) [27] technique forobjective quality assessment.

Estimating quality perception for the end-user (subjective quality) is ahighly complex problem because different individuals can have differentpersonal feelings about the same conversation’s quality. As a result,there is no universal standard defining which factors lead to whatdegree of user perceived QoS for a real-time communication application.A few assessments for end user perceived video or audio quality havebeen done in the context of Internet-based applications. Most of themdid not test the delay factor that we must address in a live call. Coleand Rosenbluth [8] looked at this in the context of audio. Chen andThropp [28] conducted a survey about the effects of frame rates (FRs)on human performance then concluded a FR around 15 HZ seems tobe a minimum boundary for human satisfaction; however, this variesaccording to video content, the viewers, and applications. Based onthis, Ou et al. [29] built models to reflect the trend observed fromsubjective testing, specifically how perceived quality of video changeswith different FRs. Athula Balachandranet, et al. [30] developed apredictive model for Internet video QoE by using machine learningand metrics that capture delivery-related effects, such as bitrate delivered,bitrate switching, the rate of buffering, and join time [31, 32, 33, 34].

Some quality assessment has focused on real-time video. For example,by understanding how video-based applications are influenced bynetwork conditions. For example, French et al. [35] proposed anarchitecture to estimate real-time video QoE based on analysis of RealTime Messaging Protocol (RTMP) streams. They carried out experimentsto demonstrate that they can predict video QoE based on stream statemeasurements (frame rate, bandwidth, and bitrate) and previous users’ratings with 70-80% accuracy. Hossfeld et al. [36] presented hownetwork delay (initial delay and interruption) affects human perceivedvideo quality with a series experiments.

Specific to WebRTC, some related work has been done for performanceassessment. In [37], Sajjad Taheri et al. promote a benchmark (publiclyavailable under GPL license) that can measure WebRTC peer connectionperformance. They evaluate WebRTC performance over a range ofimplementations and devices. However, they did not add WebRTCvideo quality measurements to this benchmark. Patrik Höglund [38]


presents how Google realize WebRTC video quality measurement byusing peak signal-to-noise ratio (PSNR) and SSIM. However, this testonly measures correctness by comparing input and output videos,without considering other elements that can affect video quality, suchas frame rate or resolution.

The thesis will build upon the existing established state of the art inthe field, by combining some “objective” methods to adapt algorithmsto predict a subjective quality score.

2.6 Summary

This chapter presented high-level introductions:

• What is WebRTC? Together with explanations on multiple aspects,such as architecture, protocols, and APIs.

• How does WebRTC system work, specifically when establishinga peer connection?

• How could Ericsson create a video conference service with WebRTCtechniques?

Chapter 3

Implementation and development

This chapter describes the implement and development carried out inthis degree project. The chapter begins with an introduction of whatengineering-related and scientific methods were applied, specificallyanalyzing, modeling, developing, and evaluating a model. The purposeof this chapter is to provide an overview of the research method usedin this thesis. Section 3.1 describes the data processing, including datacollection, data analysis, and data visualization. Section 3.2 focus onselecting the model and building methods to grade and predict theperceived video quality based on the collected data of Section 3.1.Section 3.2 also introduces two algorithms that are used for big-dataprocessing to enhance the model. Section 3.3 explains the remediesone could apply to improve the current video quality in a session andthe corresponding development techniques that were used. Finally,Section 3.4 concludes for this chapter.

3.1 Data Processing

Data collection was a primary task at the start of this degree project.Section 2.4 explained how Ericsson WebRTC services captures statisticaldata from a WebRTC application. To learn which specific data shouldbe analyzed and how to analyze this data. More than 46,327,000RTCP-reports (one set of RTCP-report per second), corresponding tostatistical data for 772,120 minutes of sessions, were loaded into Elastic

23

24 CHAPTER 3. IMPLEMENTATION AND DEVELOPMENT

search for subsequent data processing.

3.1.1 Data Analysis

As described above, one of the key tasks of this degree project was todetermine which data features are correlated with perceived qualityas seen by the conference participants. In the remainder of this thesiswill focus on the statistical parameters correlated with video and howthey are correlated with QoS.

As sender reports indicate what was send and what NACKs (and othervalues) were received at the sender, this project focus on the RTCPsender report for the video stream. Figure 3.1, gives an example of aRTCP sender report for a video stream. This report presents the datawe collected during the in data collection stage of the project.

CHAPTER 3. IMPLEMENTATION AND DEVELOPMENT 25

Figure 3.1: statistical RTCP sender report for a video stream from 9March 2018

Looking at all of the features contained in this example RTCP video-senderreport, we see there are 27 features in total. Nevertheless, some parametersof string type, such as "ssrc", "transportId", "mediaType", and"googContentType", appear to be useless, while other parameters thatdo not reflect network conditions (e.g. "timestamp", "googFrameRateInput","googFrameRateSent", and "googWidthRateSent",etc ) are not utilized


in this project. Of all of these parameter, 8 particular parameters are atthe center of this thesis project, include "packetsLost", "googAdaptionChanges","googAvgEncodeMs", "googEncodeUsagePercent", "googFirsReceived","googNacksReceived", "googPlisReceived", and "googRtt". Table 3.1[39, 40, 41] describes the 8 statistical parameters of an RTCP video-senderreport that will be examined in this thesis project in detail.


Table 3.1: Parameter specifications of RTCP video-sender report(The parameters prefixed by "goog" signify Google)

Parameter SpecificationgoogRtt Describe the round-trip-time measured via

RTCP (Unit: ms)packetsLost A cumulative number, specifying the number

of RTP packets lost for this SSRC. So we cancalculate the number of lost packet per secondby dividing the difference between the currentvalue of packetLost and its earlier value and thetime between the two reports.

googNacksReceived The number of Negative Acknowledgement(NACK) received.NACKs reflect RTP packets were lost.

googPlisReceived The number of times the receiver of the streamsent a Picture Loss Indiciation (PLI) packet to thesender, indicating that it had lost some encodedvideo data for one or more frames.

googFirsReceived A count of the total number of Full Intra Request(FIR) packets received by the sender. A FIRpacket is sent by the receiving end of the streamwhen it falls behind or has lost packets and isunable to continue decoding the stream. Thehigher the value of this parameter is, the moreoften a problem of this nature arose, whichcan be a sign of network congestion or anoverburdened receiving device.

googAdaptionChanges Indicates whether the resolution is changedbecause of CPU issues or insufficient bandwidth.googAdaptionChanges increases, whenever oneof the two conditions changes.

googAvgEncodeMs Average encode time of video frame fromsender. (Unit: ms)

googEncodeUsagePercent Average encode time per frame divided bythe average collection time per frame. Thisrepresents the sender’s encoding efficiency.

Data analysis in Elastic search includes a series of processing actions:


Select select parameters with string column names ("@timestamp", "ssrc","googRtt", "packetsLost", "googNacks", "googPlis", "googEncodeUsagePercent","googAvgEncodeMs", "googAdaptionChanges", "googFirsReceived").

Filter filter out invalid data records, i.e., this with a "Null" value forgoogRtt.

Order order all data records by @timestamp for the next data processingaction.

Generate new columns The values of "packetsLost", "googNacks", "googPlis","googAdaptionChanges", "googFirsReceived" are cumulative over time.After ordering all data with increasing values of "@timestamp"for groups of records of independent "ssrc", one can calculate thedifference in values for each subinterval.

Drop drop duplicated data with the same value for a statistical parameter,except for "ssrc" and "timestamp".

After the series of processing actions, 422,534 RTCP-records remain. Inorder to give the rank ordering of parameters that most affect the videoquality, this project applies an impotent data processing technique,specifically Principal Component Analysis (PCA), described in Section3.1.1.1.

3.2 Principle Components Analysis

PCA[42] is a multivariate analysis approach that often starts out with adataset in which observations are described by several inter-correlatedquantitative dependent features. The goal of PCA is to extract theimportant information from the dataset, as known also dimensionreduction, in order to represent the data using as a set of new orthogonalvariables called principal components, and to present the pattern ofsimilarity of the observations. PCA is equivalent to finding a newpoint of view of the data in the original multidimensional space, butwhose axes better describe the data. The first principal componenthas the largest variance that accounts for as much of the variability inthe data as possible, and each succeeding component accounts for asmuch of the remaining variability as possible.


In PCA, the eigenvalue reflects the amount of variation in the totalsample accounted for by each factor, and the ratio of eigenvalues isthe ratio of explanatory importance of the factors with respect to thevariables. In another words, if a factor has a low eigenvalue, then itis contributing little to the explanation of variances in the variablesand may be ignored as redundant in comparison with more importantfactors.

Specifically on this project, PCA is applied to analyze multi-dimensionaldata with 8 features of RTCP-reports, including "googRtt", "packetsLost","googNacks", "googPlis", "googEncodeUsagePercent", "googAvgEncodeMs","googAdaptionChanges", "googFirsReceived". This analysis wasprogrammed with high level APIs [43] provided in Apache Spark[44].The variance ratio for each principle component (PC) are displayed inTable 3.2.

Table 3.2: Proportion of the Variance MatrixPC number Proportion cumulative

PC1 22.3% 22.3%PC2 20.9% 43.2%PC3 14.3% 57.5%PC4 14.1% 71.6%PC5 13.6% 85.2%PC6 8.2% 93.4%PC7 6.4% 99.8%PC8 0.0% 99.8%

As one can see from the above matrix: The first two features are muchstronger than the next three and these are stronger than the last two.Importantly, the first 4 explain 71.6% of the outcomes, which meansthe data dimension of stats parameters could be deduced to 4 byextracting only the first 4 strongest features.

PCA reveals a transformation based on the original inputs to a new setof outputs. That is to say, the 4 main features are linear combinationsof the original 8 variables weighted by their contribution to explainingthe variance in a particular orthogonal dimension. Table 3.3 displaysthe rotation matrix of each of these features, which are the coefficientsof the linear combinations.


Table 3.3: Rotation MatrixPC1 PC2 PC3 PC4

googRtt �0.030 0.114 �0.873 0.276googAdaptationChanges �0.133 �0.008 0.282 0.948

googAvgEncodeMs �0.697 0.004 0.005 �0.122googEncodeUsagePercent �0.701 �0.003 0.010 �0.076

googFirsReceived 0.0 0.0 0.0 0.0packetsLost �0.015 0.686 0.065 �0.004googNacks 0.042 0.635 0.250 �0.053googPlis �0.039 0.336 �0.303 0.036

PCA yields a transformation based on the original inputs to a new setof outputs. According to this rotation matrix, the corresponding lineartransformation function are:

PC1 = - 0.030 * googRtt - 0.133 * googAdaptationChanges - 0.697 * googAvgEncodeMs- 0.701 * googEncodeUsagePercent + 0.0 * googFirsReceived - 0.015 * packetsLost+ 0.042*googNacks - 0.039* googPlis

PC2 = 0.114 * googRtt - 0.008 * googAdaptationChanges + 0.004 * googAvgEncodeMs- 0.003 * googEncodeUsagePercent + 0.0 * googFirsReceived + 0.686 * packetsLost+ 0.635*googNacks + 0.336 * googPlis

PC3 = - 0.873 * googRtt + 0.282 * googAdaptationChanges + 0.005 * googAvgEncodeMs+ 0.010 * googEncodeUsagePercent + 0.0 * googFirsReceived + 0.065 * packetsLost+ 0.250*googNacks - 0.303* googPlis

PC4 = 0.276 * googRtt + 0.948 * googAdaptationChanges - 0.122 * googAvgEncodeMs- 0.076 * googEncodeUsagePercent + 0.0 * googFirsReceived - 0.004 * packetsLost- 0.053*googNacks + 0.036* googPlis

In summary:Compared with other 6 parameters googAdaptationChanges and googAvgEncodeMshave a much stronger influence on PC1. When talking about the effecton PC2, packetsLost, googNacks, and googPlis are much stronger. ForPC3, googRtt influence’s is greater than the next three googPlis, googAdaptationChanges,and googNacks. As to the last PC4, googAdaptationChanges is muchstronger than the others.


3.3 Modeling

Modelling[45] is a scientific activity, the aim of which is to make theparticular parameters or features of the world easier to understand,define, quantify, visualize, or simulate by referencing it to existingand usually commonly accepted knowledge. It requires selecting andidentifying relevant aspects of a situation in the real world and thenusing different types of models to define a model that satisfies particularaims, such as using conceptual models to gain better understand, usingmathematical models to quantify a phenomena, and using graphicalmodels to visualize something. Modelling is an essential and inseparablepart of many scientific disciplines, each of which have their own ideasabout specific types of modelling.

3.3.1 Clustering

In statistics, clustering is used to group data into categories based onsome measure of inherent similarity or distance. Given a set of dataitems with the four principle components - the outcomes of Section3.1.1, clustering algorithms could be applied to group them into differentclasses. In some multidimensional space points within each cluster aresimilar to each other, while points from different clusters are dissimilar.Usually, points are in a high-dimensional space and similarity is definedusing a distance measurement.

Mean Opinion Score (MOS) [46] gives a numerical indication of theperceived quality of the media after being encoded and decoded usingCODECs and after propagating over the path from sender to receiver.CODECs are generally assessed using an MOS score on a 5 level scale,hence this project also outputs 5 different QoS grades. This which infurther lead to a desire to clustered data into 5 classes.

3.3.1.1 Introduction to the Gaussian Mixture Model (GMM)

A Gaussian Mixture Model (GMM)[47] is a probabilistic model assumingall the data points are generated from a mixture of a finite number ofGaussian distributions with unknown parameters. GMM is one of themost statistically mature methods for clustering. The model tries to


describe the underlying generative process of the data set (assumingthat the generative process is a set of Gaussian distributions and thatonly the parameters that are unknown). In GMM, each cluster canbe seen as one distribution, such as a Gaussian distribution. Thisapproach is based upon the following:

• Each data object Xi is assumed to be a sample from an independentand identically distributed mixture of k distributions Ci.

• Each cluster is a multivariate Gaussian distribution.

• GMM use the Expectation Maximization (EM)[48] algorithm toconsider the statistics of each cluster, that includes calculation ofmean and a co-variance matrix. The goal of EM is to find themaximum-likelihood estimate of a data distribution, when thedata is partially missing or hidden. The EM algorithm iterativelyrefines the GMM parameters to increase the likelihood of theestimated model.

• Specifically for this project, after multiple iterations, each dataobject Xi gets a set of 5 different possibilities corresponding to 5clusters. The cluster which has the maximum percentage will bechosen as the class that Xi belong to.

3.3.1.2 Clustered Results

The project built a GMM model by using a clustering algorithm inApache Spark (as Spark provides a fast and general-purpose clustercomputing system).

Based on high-level APIs in Scala[49], a GMM model was trained with4 main features (containing the four selected features - PC1, PC2, PC3,and PC4) based upon the 422534 RTCP-records and outputting predictioncolumns (each of these indicates the probability of each cluster). Aftertraining, all RTCP-records are clustered into the 5 clusters that arelabelled "label 0" to "label 4". The number of RTCP-records in eachcluster are showed in Table 3.4.


Table 3.4: Data quantity of clustersLabel 0 Label 1 Label 2 Label 3 Label 4 Sum

Quantity 367939 46833 439 4052 3271 422534

Figures 3.2 to 3.9 display the Cumulative Distribution Function (CDF)of 8 statistical parameters in 5 different clusters that were labelled(For some clusters the value of specific parameters was a constant"0", hence we draw them as histograms). In each figure, there are 5distribution functions marked with various colors, that represent thedata with different labels.

Figure 3.2: CDF of googRtt


Figure 3.3: CDF and histogram of googAdaptationChanges

Figure 3.4: CDF of googAvgEncodeMs


Figure 3.5: CDF of googEncodeUsagePercent

Figure 3.6: Histogram of googFirsReceived


Figure 3.7: CDF of packetsLost

Figure 3.8: CDF of googNacks


Figure 3.9: CDF of googPlis

As you can see, there are clearly different characteristics among these5 different clusters. Table 3.5 shows the class division situations foreach parameter and follows with an explanation.

Table 3.5: Class division on each parameterParameter Class DivisiongoogRtt C3 | C2, C1| C4 | C0

googAdaptationChanges C0, C1, C4 | C3 | C2

googAvgEncodeMs C0, C4 | C1, C2| C3

googEncodeUsagePercent C0, C4 | C1, C2| C3

googFirsReceived C0, C1, C2, C3, C4

packetsLost C0, C3 | C1 | C4| C2

googNacks C0, C3 | C1 | C4| C2

googPlis C0 | C3, C1 | C4| C2

• The number of cluster equals the value of "label", i.e., the GMMclustering outcome. This means that label 0 corresponds to C0,label 1 to C1, label 2 to C2, label 3 to C3, and label 4 to C4. In Table


3.5, these Cluster numbers are written as C0, C1, C2, C3, and C4,which represent 5 different data clusters.

• Class Division: Observing the above figures of 8 statistical parameters,one can see obvious diversity between different clusters. Forexample in Figure 3.3, around 80 % percentage of data in C0,C4 has the same value range of googAdaptationChanges ( ⌧ 15 ).Similarly, around 80 % percentage of data in C1, C2 has the samevalue range of googAdaptationChanges ( ⌧ 30 ). Considering thelast data cluster - C3, where the value range ofgoogAdaptationChanges is ⌧ 300 , the original 5 clusters could bedivided and reordered into 3 classes:C0, C4 | C1, C2 | C3 (Ordering the value of googAdaptationChangesfrom low to high)

Assigning a QoS grade to each cluster can be done by consideringa combination of the figures 3.2 to 3.9, Table 3.1 and Table 3.5. Asmentioned before, this project ordered the perceived QoS into 5 levels.Table 3.6 describes how we indicate the QoS grade for every clusterand is followed by an explanation.

Table 3.6: QoS grade assumption of each clusterQoS grade 4 3 2 1 0Clusternumber

C0 C3 C1 C4 C2

• C0 was tagged with the highest QoS grade - "4", representingthe best video quality, because it has the lowest value for allparameters except for googRtt. C0 has a wide distribution ofgoogRtt, where larger values indicate a long distance or morenetwork nodes between WebRTC peers.

• C3 was tagged with a second highest QoS grade - "3". That isbecause it has the lowest value of 5 parameters - googRtt,googFirsReceived, packetsLost, googNacks, and googPlis, which indicatesthere is no picture or packet loss and the lowest RTT in C3. However,compared with the best cluster C0, part of C3 has one or twogoogAdaptationChanges and longer average encoding time per frame.


• C1 was tagged with a medium QoS grade - "2". That is becauseit has a higher value of googRtt, packetsLost, and googNacks whencompared with C3. Additionally, C1 is in the same level withC3 when considering the value of googFirsReceived, and googPlis.Looking intogoogAvgEncodeMs and googEncodeUsagePercent, the values for C1

are lower than C3, which means C1 has a lower average encodingtime than C3. However, this difference in encoding time for C1 islower than C3 is approximately equal to the difference in valuesof RTT for C3 and C1 (around 250ms bias for 80% percentage ofdata). In summary, C1 should be tagged with a lower QoS gradethan C3.

• C4 was tagged with a bad QoS grade - "1". Compared with C1, C4

has a higher value of googRtt, packetsLost, googNacks, and googPlisas well as the same value of googFirsReceived and googAdaptationChanges.Although c4 has a lower value of googAvgEncodeMs andgoogEncodeUsagePercent than C1, this time difference is far lessthan the difference of googRtt between them: the googRtt of C4

higher than C1 around 1000ms for 80% percentage of data; theaverage encoding time of C4 lower than C1 around 15ms for 80%percentage of data.

• C2 was tagged with the lowest QoS grade - "0" because it has thehighest value of 4 parameters - googAdaptationChanges, packetsLost,googNacks, and googPlis. Compared with these other clusters, C2

has a significant difference for these 4 parameters (One can seethis from the CDF figures of these 4 parameters).

These clustered GMM results also address a data imbalanced issue,as the 5 clusters are very different in terms of the amount of data. InTable 3.4, one can observe that the number of records that fall into C0

is 367939 and this greatly exceeds the number of records that fall inthe other clusters. The next section describes how to address the dataimbalanced problem.


3.3.2 Sampling Methodologies

In order to solve the imbalanced distribution problem, one can applysampling[50] techniques - over-sampling or under-sampling to eachcluster. These two methodologies in data analysis have received significantattention as a means to counter the effect of imbalanced data sets. Inrecent years, they are common used to adjust the cluster distributionof a data set (i.e. the ratio between the different classes/categoriesrepresented). Technically, over-sampling and under-sampling are oppositeand approximately equivalent methods. They both involve using abias to select more samples from one class than from another.

3.3.2.1 Over-sampling with SMOTETomek

There are a wealth of methods available to over-sample a data set usedin a typical classification problem (using a classification algorithm toclassify a set of images, given a labelled training set of images). Inthis project, over-sampling was applied to generate more data sampleswith labels "label 2", "label 3", and "label 4" based on the GMM clusteredresults of Section3.2.1. The most common technique is known as SMOTE:Synthetic Minority Over-sampling Technique[51].

SMOTE is an advanced method of over-sampling developed by Chawala[52]. It supplements the minority class by creating artificial examplesin this class rather than by replicating the existing examples. Thealgorithm works as follows[53]:

• Assume A is the minority class and B is the majority class. Then,for each observation x that belongs to class A, the k-nearest neighborsof “x” were identified.

• A few neighbors are randomly selected (the number of neighborsdepends on the desired rate of over-sampling).

• Artificial observations are then generated and spread near theline joining “x” to its nearest neighbors.

Several methods have been developed to improve the original SMOTEalgorithm, such as SMOTE + Tomek. Although over-sampling minorityclass examples can balance class distributions, some other problems


that exist in data sets with skewed class distributions are unresolved.Usually, class clusters are not well defined, hence some majority classobjects might invade the minority class space, which leads to interpolated"minority class" samples being introduced too deeply in the majorityclass space. This situation can result in over-fitting by the classifier.

In order to generate better-defined class clusters, Tomek links[54] canbe applied to the over-sampled training set as a data cleaning method.Thus, instead of removing only the majority class examples that formTomek links, examples from both classes are removed.

To implement SMOTE + Tomek algorithm, this project implementeda program based on C0 and C1 by calling methods provided by theimbalanced-learn library of Python’s scikit-learn package[55].

3.3.2.2 Under-sampling with Random-Sampler

Given an original data set S, under-sampling algorithms will create anew set S’ where |S’| < |S|. In other words, under-sampling techniqueswill reduce the number of samples in the targeted classes.Random-Sampler[56] is a fast and easy way to balance the data byrandomly selecting a subset of data for the targeted classes. Withthe controlled algorithm, the number of samples to be selected canbe specified, which makes Random-Sampler the most naive way ofperforming such selection.

To implement the Random-Sampler algorithm, this project implementeda program based on C2, C3, and C4 by calling methods provided fromthe imbalanced-learn library of Python’s Scikit-learn package.

3.3.2.3 Balanced Results

Figure 3.10 presents a comparison of before and after sampling onclustered stats data.


Figure 3.10: Balanced Results

The result is that number of points in the clusters with QoS grades "4"and "2" are reduced. Conversely, the number of points in the clusterswith QoS grades "3", "1", and "0" has increased. In other words, theimbalanced distribution among the 5 clusters was solved by usingSMOTETomek together with Random-Sampler techniques.

3.3.3 Classification using Random Forests Algorithm

In the terminology of machine learning and statistics, classification[57]consists of identifying to which of a set of categories (clusters) a newobservation (WebRTC RTCP-record) belongs, on the basis of a trainingset of data containing observations whose category membership isknown (records already tagged with a cluster identifier). Examplesof such tags are labeling a given email is assigned as to whether thisemail is in the "spam" or "non-spam" class, or assigning a diagnosis fora patient with observed characteristics of the patient (gender, bloodpressure, etc.).

This section introduces the Random Forests framework, how RandomForests work, and how they were used in this degree project.

Random forests are a method based upon ensembles which makespredictions by averaging over the predictions of several independentbase models implemented as decision trees. In decision analysis, adecision tree can be used to visually and explicitly represent decisionsand decision making. It uses a tree-like graph to show the possibleconsequences. Inputting a training data set with targets and features


into a decision tree, can be used to formulate a set of rules. Subsequently,these rules can be used to make predictions.

Since Random Forests were first introduced by Breiman[58], the randomforests algorithm has been extremely successful as a general purposeclassification algorithm. There are two phases in the algorithm: (1)random forest creation and (2) making a prediction from the randomforest classifier built in the first phase. It is easy to understand thewhole process using Figure 3.11.

Figure 3.11: Random Forest Simplified[59]

The steps of the tree generation phase are[60]:

1. Randomly select “k” features from total of “m” features wherek ⌧ m . (In this project, m equals to 4, representing PC1 to PC4.)

2. Among the “k” features, calculate the root node using the bestsplit point.

3. Split the node into daughter nodes.

4. Repeat the first three steps until the number of nodes has beenreached. (The tree with a root node and the target as the leafnode.)


5. Build a forest by repeating the first 4 steps for “n” number timesto create “n” number of trees.

The steps of the prediction phase are[60]:

• Take the test features and use the rules of each randomly createddecision tree to predict the outcome and store the predicted outcome.(In this project, the number of outcomes is the number of clusters)

• Calculate the votes for each predicted outcome.

• Consider the highest voted predicted target as the final predictionfrom the Random Forest algorithm.

To implement a Random Forest algorithm, this degree project first splitthe stats data set into 70% as a training set and 30% as a testing set.The training set was used to train a Random Forest model by callingthe RDD-based APIs[61] in Spark using a program written in Scala.To evaluate the prediction accuracy of this Random Forest model, thetesting set was feed into the model and this showed approximately90% accuracy.

3.4 Selection of Remedial Actions

By using clustering and a classifier based on WebRTC stats data, wecan make either the clients or a central controller aware of the currentnetwork environment. This in turn leads to the next question, whatremedies can be applied to improve the current QoS. This enables usto realize WebRTC quality control in contextual systems. Technically,remedial actions (such as reducing frame-rate, changing resolution,etc.) to improve video quality. As already seen in Section 3.2.1.2, datawas classified in 5 different QoS levels. These QoS levels obviouslyhave different characteristics, that in turn leads to disparate remedyactions that could be applied in each situation (as summarized in Table3.7).


Table 3.7: Remedies to apply in different situationsQoSgrade

Remedy

4 Nothing to do.3 Recommend: Reduce the resolution to keep the video

quality stable.2 Highly Recommend: Drop the frame rate of the video or

Reduce the resolution to improve the clarity and correctnessof the video.

1 The current network situation is horrible, stronglyrecommend you drop the video completely to keep audiofluency.

0 If the situation persists a while, we recommend you toterminate the conference.

Based on the explanation of Table 3.6, where we describes the reasonfor the QoS grade indication, the specification of what remedy to applyin different situation in detail are:

• The characteristics of sessions with QoS grade "3" includes thelowest value of RTT, no picture or packet lost, but one or twomore googAdaptationChanges occured and a longer than averageencoding time per frame. In this situation, the video resolutionshould be changed because of CPU issues or because there isinsufficient bandwidth. Meanwhile, the higher value ofgoogAvgEncodeMs as well as a high googEncodeUsagePercent impliesthe available CPU or bandwidth are insufficient to supply theongoing resolution. Therefore, remedial actions should be takento reduce the resolution directly in order to keep the video qualitystable.

• The characteristics of session with QoS grade "2" include: a highervalue of packetsLost and googNacks; a lower value of googPlis; amoderate level of googAvgEncodeMs and googEncodeUsagePercent.Hence, we highly recommend the conference peers drop the videoframe rate or reduce resolution to improve the clarity and correctnessof the video.


• The QoS grade "1" represents bad video quality. The sessionsin this cluster have the lowest value of googAvgEncodeMs andgoogEncodeUsagePercent, which implies a strong possibility of lowresolution. Although, these session had a much higher value ofgoogRtt, packetsLost, googNacks, and googPlis, which is associatedwith a horrible perceived video quality —— choppy animation,frame skipping, etc. That is why we recommend the conferencepeers drop the video completely in order to keep audio fluency.

• Sessions with QoS grade "0" indicates extremely terrible networkconditions. However, this situation seldom or never happened inthe over all set of records. In fact, data labelled as QoS grade "0"only occurred 439 times. The recommendation of this cluster isto terminate the conference, if this situation persists.

3.5 Summary

Chapter 3 described the progress of this degree project from data processingto modeling methodologies. The PCA model and Random Forest modeltrained in this chapter are used in next chapter. This chapter proposedseveral remedial actions to optimize the perceived video quality forWebRTC-based applications.

Chapter 4

Evaluation of WebRTC test platform

This chapter describes an experimentation evaluation of the classificationmodel in a WebRTC application together with verification of the validityof the remedial actions proposed in Section 3.3. Section 4.1 describesthe WebRTC test platform. Section 4.2 focus on methods and algorithmsused for interacting with the WebRTC application and the classifiermodel. Section 4.3 shows a web interface to the software that was usedfor this project. Section 4.4 explains the network simulation environment.Section 4.5 proposes and analyses outcomes for an evaluation experiment.Finally, Section 4.6 states a conclusion for this chapter.

4.1 Framework of Evaluation

The WebRTC test platform shown in Figure 4.1 is mainly composed oftwo parts provided by two Docker[62] containers:

o webRTC-E2E-Test

webRTC-E2E-Test is a docker container that runs two instancesof a WebRTC application communicating with a Coturn serverserved from turn-netsim. The "webRTC-E2E-Test" container playsa crucial part in this project by sending real-time statisticalparameters to the back end classifier as well as receiving theoutcome - a QoS score from the classifier. Section 4.3 providesmore information.

47

48 CHAPTER 4. EVALUATION OF WEBRTC TEST PLATFORM

o turn-netsim

turn-netsim is a docker container which contains two services:

– Coturn server[63] as TURN server (the concept of a TURNserver was introduced in Section 2.2.3), providing a mediarelay.

– A Web application is listening on port 3000 to tune a trafficcontrol tool which simulates a slow and lossy network.

Figure 4.1: Test Platform Framework

4.2 Interaction Theory

This section introduces how the classifier interacts with WebRTC application.More specifically, building a WebSocket[64] connection for forwarding

CHAPTER 4. EVALUATION OF WEBRTC TEST PLATFORM 49

stats message to Random Forest, which is responsible for sending aQoS score back to the front-end in real-time.

The WebSocket Protocol is a generally supported open standard fordeveloping real-time applications. It enables bi-directional communicationbetween a client running code in a controlled environment to a remotehost that has opted-in to communications from that code. A WebSocketconnection between a client and server enables both parties to senddata at any time (hence it is full duplex).

The steps of interaction process are:

• When creating a WebSocket connection, the first step is to establisha TCP connection via which the client and server agree on usingthe WebSocket Protocol.

• After a TCP connection is established, the client initiates a WebSocketconnection via a process known as a WebSocket handshake. Thisbegins when the client sends an HTTP request to the server. AnUpgrade header contained in this request message notifies theserver that the client is trying to establish a WebSocket connection.

• If the server supports the WebSocket protocol, it will agree to theupgrade and communicates this by sending an Upgrade headerin a response message.

• Now that the handshake is complete the initial HTTP connectionwill be replaced by this WebSocket connection that uses the sameunderlying TCP connection. At this point, both sides can start tosend data.

As soon as a peer connection is established between the WebRTCapplication and classifier, stats parameter data will be collected intoa stats matrix where data is saved for a period of 10 seconds. Every10s, the stats matrix will be transmitted over the WebSocket. After theclassifier makes a prediction based on each set of stats parameters, theclassifier’s result will be mapped into a QoS grade, as shown in Table3.6.

After mapping, a voting algorithm will be called to obtain a final QoSgrade for each ten seconds. Since the data collecting rate is one RTCP-reportper second, this WebRTC application could collect 10 sets of statistical


parameters every ten seconds. The voting algorithm in charge of findingthe QoS grade selected the QoS grade that appears most often in outcomescomputed from the set of 10 samples. Generally, the QoS control systemtakes less than 1 second to send the stats record to the classifier untilan answer is returned via the WebSocket.

4.3 Software design

Figure 4.1 shows the web interface for the WebRTC APP running inthe platform.


Figure 4.2: Web interface layout

Figure 4.2 shows the page divided into the following parts:


• 4 frames, from top to bottom and from left to right, are the originalvideo stream of peer A, the original video stream of peer B, thevideo stream from peer A after transmission to B and the videostream from peer B to A.

• A division to display the real-time score for video quality

• A division to show the proposed remedy corresponding to theassigned QoS grade.

• A division to present the current video constraints on resolutionand framerate. This enables the users to choose a much lowerresolution or framerate than current setting, when they arerecommended to "reduce resolution" or "drop framerate".

• Last is a division to realize video control options, as instructedby the "remedy description" division.

4.4 Network Simulation Environment

Setting up the environment to run a net simulator consists of:

• Build the turn-netsim container,

• Before running this container, map the TCP port 443 and 3000 tolocalhost,

• Access the port 3001 from localhost’s browser, i.e. http://localhost:3001/,and

• Modify the network parameters for each network interface in thenet simulator page, as shown in Figure 4.3.


Figure 4.3: Net Simulator Page

4.5 Results and Analysis

In each real-world test, the WebRTC application works well as statedin Sections 4.2 and 4.3. Ten seconds after, the clients start a videoconference, they will receive a score for the current video quality aswell as a suggestion for the corresponding remedy. An example ofthis is shown in Figure 4.4. This figure illustrate how the applicationdetects current QoS of a contextual communication session along withadjustments that can be made to remedy poor quality.


Figure 4.4: An test showing the video quality QoS grade andrecommended remedial action

If the user applies the recommended remedy, they should perceive ahigher quality video, thus in the next ten seconds they should see ahigher QoS grade than before. Every 10s, the QoS grade is refreshed.This interface implements video quality control for the ongoing WebRTCsessions.

Generally, it takes 1 to 2 seconds for the users from getting arecommendation to acting on it, including reading the remedialrecommendation, moving their mouse, and selecting the remedial action.To some extent, this degrades the performance that the users experienceand also means that the first one or two measurements in the current10 second interval are not representative of QoS of the parameters atthe end of the 10 second interval. However, if the adaptation wereautomatic (as could be done in the future), the user would consistentlyexperience the best quality that they could get with a 10 secondaveraging time.

As an evaluation, this project carried out an experiment to show therelationship between the parameters of the network simulator(bandwidth, delay, packet loss) and the QoS score. This experimentwas done by evaluating the QoS grade for a continuous WebRTC conferenceunder different conditions (i.e., with different parameters for the networksimulator) and adopted the single variable principle. Figures 4.5, 4.6,and 4.7 display the outcome of this experiment.

In detail, the settings of bandwidth and delay were constant 4M and60ms separately of Figure 4.5. In the similar way, the the settings


of bandwidth and packet loss percentage were constant 4M and 0%separately for Figure 4.6 and the the settings of delay and packet losspercentage were constant 60ms and 0% separately for Figure 4.7. Forevery setting, the experiment records ten scores (during 100 seconds)and reports the average as a QoS grade of the current setting. Thesefigures fit a linear relation between the settings of network simulatorand the QoS grade by using a trendline[65] and R2[66] value (a numberfrom 0 to 1 that reveals how closely the estimated values for the trendlinecorrespond to the actual data, with "1" representing a perfect fit betweenthe data and the line drawn through them, and "0" representing nostatistical correlation between the data and a line).

Figure 4.5: The relationship of the video quality QoS grade and packetloss percentage setting

Figure 4.5 shows that packet loss generally has a negative effect onthe QoS grade but has high variance, which means the QoS gradeis not sensitive to packet loss. For those points at the right of theline represent points that had higher packet loss rates, but in somecases got very good ratings. This is because the QoS grade did notgenerated based on a single parameter and packetsLost is not a strongfactor according to the PCA results showing in Section 3.2.


Figure 4.6: The relationship of the video quality QoS grade and delaysetting

Figure 4.6 shows that the network delay has a negative effect on theQoS grade. By observing the slope of the trendline equation, onecan see the QoS grade is much more sensitive to delay than packetloss. This is because the googRtt is a stronger factor than packetsLost toamong the PCA features.


Figure 4.7: The relationship of the video quality QoS grade andbandwidth setting

Figure 4.7 shows that the number of bandwidth generally has a positiveeffect on the QoS grade. This implies considering bandwidth as anadditional statistical parameter could be done in the future for thisproject.

4.6 Summary

This chapter introduced a testing process for a WebRTC application. Itfirst showed the framework of the WebRTC test platform, followedby a specification of the WebSocket connection for interacting withthe WebRTC application and the classifier model. The web interfacefor the WebRTC application and the network simulator were shown.The chapter also provided information about the software design ofthe testing. Finally, Section 4.5 presented an evaluation experiment toshow the relationship between the parameters of the network simulatorand the QoS grade.

Chapter 5

Conclusions and Future work

This chapter states some conclusions based on the entire project. Moreover,it also discusses some of the limitations and suggests some potentialfuture work. The chapter ends with a few relevant reflections on thiswork.

5.1 Conclusions

The project consisted of data processing and modeling to understandthe perceived quality of WebRTC conferences. After applying dataprocessing and principal component analysis, Section 3.2 describeswhich WebRTC stats-parameters are important to gather and analyzein order to predict the perceived conference quality, which is the firstexpected deliverable. To produce the second expected deliverable, thisproject input outcomes of the PCA to a GMM clustering model. Basedon the clustered outcomes, a QoS grade was assigned to each cluster.Continuously, a Random Forest classifier was trained to predict a QoSgrade for newly collected WebRTC stats data during the duration ofa WebRTC conference. Section 3.4 presents what action should betaken to improve the perceived video quality and Section 4.3 stepwisedescribes how to integrate these action decisions into a working POC,which hence produced the last expected deliverable. In the end, afterimplemented a experiment running on a Ericsson contextual testplatform where the recommended remedies from the back-end model

58

CHAPTER 5. CONCLUSIONS AND FUTURE WORK 59

were shown to the user who could act on them to improve the qualityof a current session, the project proved two of the stats-parameters(network delay and packet loss percentage) for assessing QoS havethe negative effect on the perceived video quality but with differentinfluence degree. Additionally, the available bandwidth turned out tobe an important factor. However, as WebRTC’s Statistics APIs do notprovide the stats of available bandwidth currently, the project did notconsider the available bandwidth as an stats-parameter to predict thevideo quality. Hence it should be added as an additional stats-parameterin the future.

5.2 Limitations

The section lists some limitations of this thesis project. These limitationswere:

• Because all of the software was deploy on a localhost, theperformance of this computer was a limitation. During the dataprocessing phase, it took more than 28+ hours to load all of theoriginal historical data (4GB) into Elasticsearch. The training ofthe model took a few hours.

• For Section 3.2, the lack of data diversity is also a limitation.There may be more than 5 clusters, but we do not have enoughdata to see them. What is more, there was very little data withbad QoS grades. If the data in bad quality clusters is sufficient,and there is no need to apply an over-sampling algorithm togenerate new "bad" data.

• Some elements of the software used in the thesis project are notopen source, hence details of them are not presented in this thesis.

5.3 Future work

There remains a lot of future work that could be done. As the Section4.5 states the user would consistently experience the best quality whenadaptation is called automatically. What is more, the project trained

60 CHAPTER 5. CONCLUSIONS AND FUTURE WORK

the model with parameters that WebRTC’s Statistics APIs currentlyprovide. Expanding to additional stats parameters, such as availablebandwidth, could be a future task. In addition, gathering additionalstats data would help to explore the probability of corner cases. Lastbut not least, testing conducted on commercial WebRTC products facingthe public. Therefore, a future evaluation could be made in the realworld rather than a simulation (emulation) environment.

5.4 Reflections

As WebRTC is becoming available via an increasing number of devices(PC, mobile platforms, IoT devices, etc.) and will be used in increasinglydiverse settings, a video quality control system could be a useful functionfor WebRTC applications. To some extend, it may provide an alternativeoption in the field of QoS control, when developing solutions forcommunication targeting professional and business users. Furtherdevelopment of a real-time video quality control system thatautomatically took remedial actions could potentially bring many benefitsto the customers (both in terms of providing higher quality and avoidingwasting resources when it is not feasible to have a high quality session).

Regarding the privacy aspects of WebRTC peers, this project does notinvolved any users’ information, such as name, location, IP address,communicating device, etc. It is important to note that the SSRC is arandom 32 bit integer as per section 3 of RFC 3550.

References

[1] Getting Started | WebRTC. URL: https://webrtc.org/start/(visited on 04/11/2018).

[2] Varun Singh. Analytics for WebRTC — callstats.io. en. URL: / /www.callstats.io/ (visited on 04/11/2018).

[3] Xiaokun Yi. Adaptive Wireless Multimedia Services. Master’s thesis,KTH Royal Institute of Technology, Stockholm, Sweden, May2006. URL: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-92208.

[4] Elasticsearch BV. Elasticsearch. en-us. Products. URL: https://www.elastic.co/products/elasticsearch (visited on04/11/2018).

[5] Elasticsearch BV. Kibana: Explore, Visualize, Discover Data | Elastic.URL: https://www.elastic.co/products/kibana (visitedon 04/11/2018).

[6] #webrtc. WebRTC architecture. URL: http://io13webrtc.appspot.com/#31 (visited on 04/11/2018).

[7] WebRTC. Frequent Questions. Apr. 2016. URL: https://webrtc.org/faq/#what-is-the-isac-audio-code (visited on04/11/2018).

[8] R. G. Cole and J. H. Rosenbluth. “Voice over IP performancemonitoring”. en. In: ACM SIGCOMM Computer CommunicationReview 31.2 (Apr. 2001), p. 9. ISSN: 01464833. DOI: 10.1145/505666.505669. URL: http://portal.acm.org/citation.cfm?doid=505666.505669.

[9] S. Andersen et al. “Internet Low Bit Rate Codec (iLBC)”. In: InternetRequest for Comments RFC 3951 (Dec. 2004). ISSN: 2070-1721. URL:http://www.rfc-editor.org/rfc/rfc3951.txt.

61

https://webrtc.org/start/

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-92208

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-92208

https://www.elastic.co/products/elasticsearch

https://www.elastic.co/products/elasticsearch

https://www.elastic.co/products/kibana

http://io13webrtc.appspot.com/#31

http://io13webrtc.appspot.com/#31

https://webrtc.org/faq/#what-is-the-isac-audio-code

https://webrtc.org/faq/#what-is-the-isac-audio-code

https://doi.org/10.1145/505666.505669

https://doi.org/10.1145/505666.505669

http://portal.acm.org/citation.cfm?doid=505666.505669

http://portal.acm.org/citation.cfm?doid=505666.505669

http://www.rfc-editor.org/rfc/rfc3951.txt

62 REFERENCES

[10] A. Duric and S. Andersen. “Real-time Transport Protocol (RTP)Payload Format for internet Low Bit Rate Codec (iLBC) Speech”.In: Internet Request for Comments RFC 3952 (Dec. 2004). ISSN: 2070-1721.URL: http://www.rfc-editor.org/rfc/rfc3952.txt.

[11] JM Valin, K. Vos, and T. Terriberry. “Definition of the Opus AudioCodec”. In: Internet Request for Comments RFC 6716 (ProposedStandard) (Sept. 2012). ISSN: 2070-1721. URL: http://www.rfc-editor.org/rfc/rfc6716.txt.

[12] J. Bankoski et al. “VP8 Data Format and Decoding Guide”. In:Internet Request for Comments RFC 6386 (Nov. 2011). ISSN: 2070-1721.URL: http://www.rfc-editor.org/rfc/rfc6386.txt.

[13] J. Rosenberg et al. “Session Traversal Utilities for NAT (STUN)”.In: Internet Request for Comments RFC 5389 (Oct. 2008). ISSN: 2070-1721.URL: http://www.rfc-editor.org/rfc/rfc5389.txt.

[14] R. Mahy, P. Matthews, and J. Rosenberg. “Traversal Using Relaysaround NAT (TURN): Relay Extensions to Session Traversal Utilitiesfor NAT (STUN)”. In: Internet Request for Comments RFC 5766(Proposed Standard) (Apr. 2010). ISSN: 2070-1721. URL: http://www.rfc-editor.org/rfc/rfc5766.txt.

[15] J. Rosenberg. “Interactive Connectivity Establishment (ICE): AProtocol for Network Address Translator (NAT) Traversal forOffer/Answer Protocols”. In: Internet Request for Comments RFC5245 (Apr. 2010). ISSN: 2070-1721. URL: http://www.rfc-editor.org/rfc/rfc5245.txt.

[16] M. Handley, V. Jacobson, and C. Perkins. “SDP: Session DescriptionProtocol”. In: Internet Request for Comments RFC 4566 (July 2006).ISSN: 2070-1721. URL: http://www.rfc-editor.org/rfc/rfc4566.txt.

[17] E. Rescorla and N. Modadugu. “Datagram Transport Layer SecurityVersion 1.2”. In: Internet Request for Comments RFC 6347 (Jan.2012). ISSN: 2070-1721. URL: http://www.rfc-editor.org/rfc/rfc6347.txt.

[18] R. Stewart. “Stream Control Transmission Protocol”. In: InternetRequest for Comments RFC 4960 (Sept. 2007). ISSN: 2070-1721. URL:http://www.rfc-editor.org/rfc/rfc4960.txt.















REFERENCES 63

[19] M. Baugher et al. “The Secure Real-time Transport Protocol (SRTP)”.In: Internet Request for Comments RFC 3711 (Mar. 2004). ISSN: 2070-1721.URL: http://www.rfc-editor.org/rfc/rfc3711.txt.

[20] H. Schulzrinne et al. “RTP: A Transport Protocol for Real-TimeApplications”. In: Internet Request for Comments RFC 3550 (July2003). ISSN: 2070-1721. URL: http://www.rfc-editor.org/rfc/rfc3550.txt.

[21] Adam Bergkvist et al. WebRTC 1.0: Real-time Communication BetweenBrowsers. Tech. rep. Word Wide Web Consortium, Apr. 2018. URL:https://w3c.github.io/webrtc-pc/ (visited on 04/11/2018).

[22] Real time communication with WebRTC: 5. Stream video with RTCPeerConnection.URL: https : / / codelabs . developers . google . com /codelabs/webrtc-web/#4 (visited on 04/11/2018).

[23] H. Alvestrand. Resolution Constraints in Web Real Time Communications.Internet Draft. Expired : February 27, 2014. IETF Network WorkingGroup, Aug. 2013, p. 8. URL: https://tools.ietf.org/html/draft-alvestrand-constraints-resolution-

00.

[24] Guidelines for getStats() results caching/throttling. URL: https://www.w3.org/TR/webrtc- stats/#guidelines- for-

getstats-results-caching-throttling.

[25] ITU-T. Parametric non-intrusive assessment of audiovisual media streamingquality. ITU-T Recommendations ITU-T P.1201. ITU-R, Oct. 2012.URL: http://handle.itu.int/11.1002/1000/11727.

[26] ITU-T. Parametric bitstream-based quality assessment of progressivedownload and adaptive audiovisual streaming services over reliabletransport. Recommendation. ITU-T Recommendations ITU-T P.1203.ITU-R. URL: http://handle.itu.int/11.1002/1000/13399.

[27] Richard Dosselmann and Xue Dong Yang. “A comprehensiveassessment of the structural similarity index”. en. In: Signal, Imageand Video Processing 5.1 (Mar. 2011), pp. 81–91. ISSN: 1863-1703,1863-1711. DOI: 10.1007/s11760-009-0144-1. URL: http://link.springer.com/10.1007/s11760-009-0144-1

(visited on 04/11/2018).




https://w3c.github.io/webrtc-pc/

https://codelabs.developers.google.com/codelabs/webrtc-web/#4

https://codelabs.developers.google.com/codelabs/webrtc-web/#4

https://tools.ietf.org/html/draft-alvestrand-constraints-resolution-00



https://www.w3.org/TR/webrtc-stats/#guidelines-for-getstats-results-caching-throttling



http://handle.itu.int/11.1002/1000/11727



https://doi.org/10.1007/s11760-009-0144-1

http://link.springer.com/10.1007/s11760-009-0144-1

http://link.springer.com/10.1007/s11760-009-0144-1

64 REFERENCES

[28] J. Y. C. Chen and J. E. Thropp. “Review of Low Frame Rate Effectson Human Performance”. In: IEEE Transactions on Systems, Man,and Cybernetics - Part A: Systems and Humans 37.6 (Nov. 2007),pp. 1063–1076. ISSN: 1083-4427. DOI: 10.1109/TSMCA.2007.904779.

[29] Yen-Fu Ou et al. “Modeling the impact of frame rate on perceptualquality of video”. In: 2008 15th IEEE International Conference onImage Processing. Oct. 2008, pp. 689–692. DOI: 10.1109/ICIP.2008.4711848.

[30] Athula Balachandran et al. “Developing a Predictive Model ofQuality of Experience for Internet Video”. In: Proceedings of theACM SIGCOMM 2013 Conference on SIGCOMM. SIGCOMM ’13.New York, NY, USA: ACM, 2013, pp. 339–350. ISBN: 978-1-4503-2056-6.DOI: 10.1145/2486001.2486025. URL: http://doi.acm.org.focus.lib.kth.se/10.1145/2486001.2486025.

[31] Florin Dobrian et al. “Understanding the Impact of Video Qualityon User Engagement”. In: Commun. ACM 56.3 (Mar. 2013), pp. 91–99.ISSN: 0001-0782. DOI: 10.1145/2428556.2428577. URL: http://doi.acm.org.focus.lib.kth.se/10.1145/2428556.

2428577.

[32] Xi Liu et al. “A Case for a Coordinated Internet Video ControlPlane”. In: Proceedings of the ACM SIGCOMM 2012 Conference onApplications, Technologies, Architectures, and Protocols for ComputerCommunication. SIGCOMM ’12. New York, NY, USA: ACM, 2012,pp. 359–370. ISBN: 978-1-4503-1419-0. DOI: 10.1145/2342356.2342431. URL: http://doi.acm.org.focus.lib.kth.se/10.1145/2342356.2342431.

[33] Athula Balachandran et al. “A Quest for an Internet Video Qualityof experience Metric”. In: Proceedings of the 11th ACM Workshopon Hot Topics in Networks. HotNets-XI. New York, NY, USA: ACM,2012, pp. 97–102. ISBN: 978-1-4503-1776-4. DOI: 10.1145/2390231.2390248. URL: http://doi.acm.org.focus.lib.kth.se/10.1145/2390231.2390248.

[34] Mark Watson. HTTP Adaptive Streaming in Practice. Keynote. SanJose, California, USA, Feb. 2011. URL: http://web.cs.wpi.edu/~claypool/mmsys-2011/Keynote02.pdf.

https://doi.org/10.1109/TSMCA.2007.904779

https://doi.org/10.1109/TSMCA.2007.904779

https://doi.org/10.1109/ICIP.2008.4711848

https://doi.org/10.1109/ICIP.2008.4711848

https://doi.org/10.1145/2486001.2486025

http://doi.acm.org.focus.lib.kth.se/10.1145/2486001.2486025


https://doi.org/10.1145/2428556.2428577




https://doi.org/10.1145/2342356.2342431

https://doi.org/10.1145/2342356.2342431



https://doi.org/10.1145/2390231.2390248

https://doi.org/10.1145/2390231.2390248



http://web.cs.wpi.edu/~claypool/mmsys-2011/Keynote02.pdf

http://web.cs.wpi.edu/~claypool/mmsys-2011/Keynote02.pdf

REFERENCES 65

[35] H. French et al. “Real time video QoE analysis of RTMP streams”.In: 30th IEEE International Performance Computing and CommunicationsConference. Nov. 2011, pp. 1–2. DOI: 10.1109/PCCC.2011.6108105.

[36] T. Hossfeld et al. “Initial delay vs. interruptions: Between thedevil and the deep blue sea”. In: 2012 Fourth International Workshopon Quality of Multimedia Experience. July 2012, pp. 1–6. DOI: 10.1109/QoMEX.2012.6263849.

[37] S. Taheri et al. “WebRTCbench: a benchmark for performanceassessment of webRTC implementations”. In: 2015 13th IEEE Symposiumon Embedded Systems For Real-time Multimedia (ESTIMedia). Oct.2015, pp. 1–7. DOI: 10.1109/ESTIMedia.2015.7351769.

[38] Patrik Höglund. Automated Video Quality Measurements. Apr. 2013.URL: https://developers.google.com/google-test-automation-conference/2013/presentations%5Cnewline%

20#Day1LightningTalk4.

[39] webrtc-stats-Add RTCOutboundRTPStreamStats.totalEncodeTime. URL:https://github.com/w3c/webrtc-stats/pull/184/

files/c4b67f35ffb87ec43c4a02ae9256e63d4529550a#

diff-98a5b017f8b861a70f52aee2a9a2bb55.

[40] RTCStatsReport. URL: https://developer.mozilla.org/en-US/docs/Web/API/RTCStatsReport.

[41] webrtc-stats for how much time it takes to encode video. URL: https://lists.w3.org/Archives/Public/public-webrtc-

logs/2017Feb/0001.html.

[42] Xiang Zhou, Dan Xu, and Ting-Ting Jiang. “Simplifying multidimensionalfermentation dataset analysis and visualization: One step closerto capturing high-quality mutant strains”. In: Scientific Reports.39875 (2017) (Jan. 2017). DOI: https://doi.org/10.1038/srep39875.

[43] Dimensionality Reduction - RDD-based API. APIs Doc. URL: https://spark.apache.org/docs/latest/mllib-dimensionality-

reduction . html # principal - component - analysis -

pca.

[44] Apache Spark. APIs Doc. URL: https://spark.apache.org/docs/2.2.0/index.html.

https://doi.org/10.1109/PCCC.2011.6108105

https://doi.org/10.1109/PCCC.2011.6108105

https://doi.org/10.1109/QoMEX.2012.6263849

https://doi.org/10.1109/QoMEX.2012.6263849

https://doi.org/10.1109/ESTIMedia.2015.7351769

https://developers.google.com/google-test-automation-conference/2013/presentations%5Cnewline%20#Day1LightningTalk4



https://github.com/w3c/webrtc-stats/pull/184/files/c4b67f35ffb87ec43c4a02ae9256e63d4529550a#diff-98a5b017f8b861a70f52aee2a9a2bb55



https://developer.mozilla.org/en-US/docs/Web/API/RTCStatsReport

https://developer.mozilla.org/en-US/docs/Web/API/RTCStatsReport

https://lists.w3.org/Archives/Public/public-webrtc-logs/2017Feb/0001.html



https://doi.org/https://doi.org/10.1038/srep39875

https://doi.org/https://doi.org/10.1038/srep39875

https://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html#principal-component-analysis-pca




https://spark.apache.org/docs/2.2.0/index.html

https://spark.apache.org/docs/2.2.0/index.html

66 REFERENCES

[45] Kara Rogers. Scientific modeling. URL: https://www.britannica.com/science/scientific-modeling (visited on 06/18/2018).

[46] Nadeem Unuth. Mean Opinion Score (MOS): a Measure of VoiceQuality. Apr. 2018. URL: https : / / www . lifewire . com /measure-voice-quality-3426718.

[47] Douglas Reynolds. “Gaussian Mixture Models”. In: (). URL: https://www.researchgate.net/publication/228408146_

Gaussian_Mixture_Models.

[48] Rob Tibshirani and Trevor Hastie. Gaussian mixture models. Lecturenotes for the course Statistics 315a: Modern Applied Statistics: Elementsof Statistical Learning, Stanford University, Department of BiomedicalData Science, Winter 2018 , 12. Nov. 2008. URL: http://statweb.stanford.edu/~tibs/stat315a/LECTURES/em.pdf.

[49] Scala. URL: https://www.scala-lang.org/.

[50] Nitesh V. Chawla. “Data mining for imbalanced data sets”. In:Chapter 40. URL: https://www3.nd.edu/~dial/publications/chawla2005data.pdf.

[51] Nitesh V. Chawla et al. “SMOTE: Synthetic Minority Over-samplingTechnique”. In: Journal of Artificial Intelligence Research, Volume16, pp 321-357 (June 2002). URL: https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-

html/chawla2002.html.

[52] “An Experiment with the Edited Nearest-Neighbor Rule”. In:IEEE Transactions on Systems, Man, and Cybernetics SMC-6.6 (June1976), pp. 448–452. ISSN: 0018-9472. DOI: 10.1109/TSMC.1976.4309523.

[53] T. Elhassan, et al. “Classification of Imbalance Data using TomekLink (T-Link) Combined with Random Under-sampling (RUS)as a Data Reduction Method”. In: iMedPub Journals, Volume 1,Number 2:11. ISSN 2472-1956 (June 2016, pp 1-12). URL: http:/ / datamining . imedpub . com / classification - of -

imbalance-data-using-tomek-linktlink-combined-

with-random-undersampling-rus-as-a-data-reduction-

method.pdf.

https://www.britannica.com/science/scientific-modeling

https://www.britannica.com/science/scientific-modeling

https://www.lifewire.com/measure-voice-quality-3426718

https://www.lifewire.com/measure-voice-quality-3426718

https://www.researchgate.net/publication/228408146_Gaussian_Mixture_Models



http://statweb.stanford.edu/~tibs/stat315a/LECTURES/em.pdf

http://statweb.stanford.edu/~tibs/stat315a/LECTURES/em.pdf

https://www.scala-lang.org/

https://www3.nd.edu/~dial/publications/chawla2005data.pdf

https://www3.nd.edu/~dial/publications/chawla2005data.pdf

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html



https://doi.org/10.1109/TSMC.1976.4309523

https://doi.org/10.1109/TSMC.1976.4309523

http://datamining.imedpub.com/classification-of-imbalance-data-using-tomek-linktlink-combined-with-random-undersampling-rus-as-a-data-reduction-method.pdf





REFERENCES 67

[54] Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Maria CarolinaMonard. “Data mining with imbalanced class distributions: conceptsand methods. Proceedings of 4th Indian International ConferenceArtificial Intelligence (IICAI) 2009, 2009, pp 359-376”. In: (). URL:http://conteudo.icmc.usp.br/pessoas/gbatista/

files/iicai2009.pdf.

[55] imbalanced-learn package overview. APIs Doc. URL: http://contrib.scikit-learn.org/imbalanced-learn/stable/install.

html.

[56] under sampling. APIs Doc. URL: http://contrib.scikit-learn.org/imbalanced-learn/stable/under_sampling.

html#prototype-selection.

[57] Minsoo Kim and Pomona College. Statistical Classification. Apr.2010. URL: http://pages.pomona.edu/~jsh04747/Student%20Theses/MinsooKim10.pdf.

[58] L. Breiman. Machine Learning. 2001.

[59] William Koehrsen. Random Forest Simple Explanation. Medium.Dec. 2017. URL: https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d.

[60] Leo Breiman. “Random Forests”. In: Mach. Learn. 45.1 (Oct. 2001),pp. 5–32. ISSN: 0885-6125. DOI: 10.1023/A:1010933404324.URL: https://doi.org/10.1023/A:1010933404324.

[61] APIs Doc. URL: https://spark.apache.org/docs/2.2.0/mllib-guide.html.

[62] What is Docker. Docker Overview. URL: https://www.docker.com/what-docker.

[63] Coturn server. URL: https://github.com/coturn/coturn.

[64] Alexey Melnikov and Ian Fette. The WebSocket Protocol. RFC 6455.Dec. 2011. DOI: 10.17487/RFC6455. URL: https://rfc-editor.org/rfc/rfc6455.txt.

[65] Add a trend or moving average line to a chart. URL: https://support.office.com/en-us/article/add-a-trend-

or-moving-average-line-to-a-chart-fa59f86c-

5852-4b68-a6d4-901a745842ad.

[66] significance of the R2 value. URL: http://www.phaser.com/modules/students/salmon/R2.pdf.

http://conteudo.icmc.usp.br/pessoas/gbatista/files/iicai2009.pdf

http://conteudo.icmc.usp.br/pessoas/gbatista/files/iicai2009.pdf

http://contrib.scikit-learn.org/imbalanced-learn/stable/install.html



http://contrib.scikit-learn.org/imbalanced-learn/stable/under_sampling.html#prototype-selection



http://pages.pomona.edu/~jsh04747/Student%20Theses/MinsooKim10.pdf

http://pages.pomona.edu/~jsh04747/Student%20Theses/MinsooKim10.pdf

https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d

https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d

https://doi.org/10.1023/A:1010933404324

https://doi.org/10.1023/A:1010933404324

https://spark.apache.org/docs/2.2.0/mllib-guide.html

https://spark.apache.org/docs/2.2.0/mllib-guide.html

https://www.docker.com/what-docker

https://www.docker.com/what-docker

https://github.com/coturn/coturn

https://doi.org/10.17487/RFC6455

https://rfc-editor.org/rfc/rfc6455.txt

https://rfc-editor.org/rfc/rfc6455.txt

https://support.office.com/en-us/article/add-a-trend-or-moving-average-line-to-a-chart-fa59f86c-5852-4b68-a6d4-901a745842ad




http://www.phaser.com/modules/students/salmon/R2.pdf

http://www.phaser.com/modules/students/salmon/R2.pdf

APPENDIX A. RTCP REPORT FIGURES 69

Appendix A

RTCP report figures

Figure A.1: statistical RTCP report for received audio

70 APPENDIX A. RTCP REPORT FIGURES

Figure A.2: statistical SSRC report for sent audio

webrtc quality control in contextual communication systems1236016/fulltext01.pdf · webrtc quality...

Documents