[ieee 2012 13th international workshop on image analysis for multimedia interactive services...

4
1 Abstract— This paper presents the field test results of the novel SkyMedia 3D/HD system in a marathon race setup. The augmentation system is tailored for immersive media experiences such as public live events in which people can interact together in order to improve user’s experience. The paper reports the first public demonstration of the HD/3D content augmentation system conducted in the Turin Marathon race setup. A large set of SkyMedia Multimedia Service Platform (MSP) building blocks has been tested and validated in a real environment. Index Terms— Stereoscopic, 3D, Multimedia Service Platform, Augmentation System I. INTRODUCTION SkyMedia is a EU-funded research project aiming at demonstrating a novel multimedia end-to-end architecture that can provide unique immersive media experiences to audiences during live events. The targeted scenario is a live event (e.g. a sport match, a musical concert, a marathon race, etc.) or any occasion where there is a large audience, whose attention is captured by an event so complex that it is almost impossible to appreciate everything that is happening at the same time. The goal of SkyMedia is to provide an enhanced experience to people involved in live events, should they be spectators, organising staff, etc. The overall architecture has been already published in [1]. During the project a marathon race organizer has joined the consortium after the first experimentation phase convinced by the high quality of the results. As a consequence, since then the whole project has been focused on the marathon events organized by Turin Marathon [2]. The project therefore will bring as much as possible usually unreachable media to marathon’s users, for instance: videos from the event itself, in both 2D or 3D, and from a UAV (Unmanned Aerial Vehicle) flying over the marathon path, videos from previous events (e.g. the summary of the previous year edition), metadata including live performance of the runners or their biographies, data linked to the event surroundings (places to visit, historic details …), user This work was supported in part by the European FP7 Project SKYMEDIA “UAV-based capturing of HD/3D content with WSN augmentation, real-time processing and immaterial rendering for immersive media experiences” (FP7- ICT-2009-4 248405). 1 (aldo.campi, rosalba.suffritti , massimo.neri)@mavigex.com, Mavigex s.r.l., Italy 2 [email protected], Vitec Multimedia, France, 3 [email protected], Thales Communications & Security, MMP- Lab, France generated content through social websites (comments on the event, photos …), etc. To this purpose, a large variety of building blocks are needed in order to fulfil all the requirements of those complex live event scenarios. A high performance Content Delivery Multimedia Architecture is required, able to transparently transport HD/3D video in the location of the event as well as to allow access to Internet targeted audience by means of different wired and wireless technology such as clustered wireless hot-spot, gigabit Ethernet cable, etc. The constitutive blocks of the SkyMedia system include: - a UAV filming the event from the sky, - several stereoscopic cameras capturing 3D feeds from the ground, - sensors on the runners to get their position, speed, heart rate, number of steps, etc., - video processing to perform colour and geometrical calibration as well as depth map evaluation for the stereo cameras, - video compression and streaming targeting various platforms (computer screens, HDTV, mobile phones, …) for both live feeds or video on demand (VoD), - metadata aggregation over the videos, - dedicated interfaces for mobile phones, touchscreen TV or immaterial screen to enhance the user experience and interaction. The first public experiments for SkyMedia were led during the latest Turin Marathon (November 13 th 2011). This paper describes part of the SkyMedia Multimedia Service Platform (MSP) that was designed and used for the tests, as well as the 3D processing that was performed. The testbed and results are then detailed to bring an overall overview of the current state of the project regarding the HD/3D subsystem. II. MULTIMEDIA SERVICE PLATFORM The Multimedia Service Platform (MSP) is responsible for providing real–time immersive services to various end-users platforms, having different constraints such as computing power, storage, screen resolution, user interface, sensors and actuators. These services are transmitted through heterogeneous transmission channels (3G+, Wi-Fi, Ethernet) with an appropriate Quality of Service and a satisfactory Quality of Experience at the user side. The MSP platform has been specifically designed to satisfy the service requirements of live event scenarios, e.g. marathon race. In particular, it collects pre-formatted multimedia data (image, video, etc.), achieves HD and 3D video processing and adaptation Aldo Campi 1 , Julien Maillard 2 , Marc Leny 3 , Rosalba Suffritti 1 , Massimo Neri 1 Field test of SkyMedia HD/3D content augmentation system for immersive media experiences

Upload: massimo

Post on 22-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) - Dublin, Ireland (2012.05.23-2012.05.25)] 2012 13th International Workshop on

1

Abstract— This paper presents the field test results of the

novel SkyMedia 3D/HD system in a marathon race setup. The augmentation system is tailored for immersive media experiences such as public live events in which people can interact together in order to improve user’s experience. The paper reports the first public demonstration of the HD/3D content augmentation system conducted in the Turin Marathon race setup. A large set of SkyMedia Multimedia Service Platform (MSP) building blocks has been tested and validated in a real environment.

Index Terms— Stereoscopic, 3D, Multimedia Service Platform,

Augmentation System

I. INTRODUCTION

SkyMedia is a EU-funded research project aiming at demonstrating a novel multimedia end-to-end architecture that can provide unique immersive media experiences to audiences during live events. The targeted scenario is a live event (e.g. a sport match, a musical concert, a marathon race, etc.) or any occasion where there is a large audience, whose attention is captured by an event so complex that it is almost impossible to appreciate everything that is happening at the same time. The goal of SkyMedia is to provide an enhanced experience to people involved in live events, should they be spectators, organising staff, etc. The overall architecture has been already published in [1]. During the project a marathon race organizer has joined the consortium after the first experimentation phase convinced by the high quality of the results. As a consequence, since then the whole project has been focused on the marathon events organized by Turin Marathon [2]. The project therefore will bring as much as possible usually unreachable media to marathon’s users, for instance: videos from the event itself, in both 2D or 3D, and from a UAV (Unmanned Aerial Vehicle) flying over the marathon path, videos from previous events (e.g. the summary of the previous year edition), metadata including live performance of the runners or their biographies, data linked to the event surroundings (places to visit, historic details …), user

This work was supported in part by the European FP7 Project SKYMEDIA

“UAV-based capturing of HD/3D content with WSN augmentation, real-time processing and immaterial rendering for immersive media experiences” (FP7-ICT-2009-4 248405).

1(aldo.campi, rosalba.suffritti , massimo.neri)@mavigex.com, Mavigex s.r.l., Italy [email protected], Vitec Multimedia, France, [email protected], Thales Communications & Security, MMP-Lab, France

generated content through social websites (comments on the event, photos …), etc. To this purpose, a large variety of building blocks are needed in order to fulfil all the requirements of those complex live event scenarios. A high performance Content Delivery Multimedia Architecture is required, able to transparently transport HD/3D video in the location of the event as well as to allow access to Internet targeted audience by means of different wired and wireless technology such as clustered wireless hot-spot, gigabit Ethernet cable, etc. The constitutive blocks of the SkyMedia system include: - a UAV filming the event from the sky, - several stereoscopic cameras capturing 3D feeds from the

ground, - sensors on the runners to get their position, speed, heart rate,

number of steps, etc., - video processing to perform colour and geometrical

calibration as well as depth map evaluation for the stereo cameras,

- video compression and streaming targeting various platforms (computer screens, HDTV, mobile phones, …) for both live feeds or video on demand (VoD),

- metadata aggregation over the videos, - dedicated interfaces for mobile phones, touchscreen TV or

immaterial screen to enhance the user experience and interaction.

The first public experiments for SkyMedia were led during the latest Turin Marathon (November 13th 2011). This paper describes part of the SkyMedia Multimedia Service Platform (MSP) that was designed and used for the tests, as well as the 3D processing that was performed. The testbed and results are then detailed to bring an overall overview of the current state of the project regarding the HD/3D subsystem.

II. MULTIMEDIA SERVICE PLATFORM The Multimedia Service Platform (MSP) is responsible for providing real–time immersive services to various end-users platforms, having different constraints such as computing power, storage, screen resolution, user interface, sensors and actuators. These services are transmitted through heterogeneous transmission channels (3G+, Wi-Fi, Ethernet) with an appropriate Quality of Service and a satisfactory Quality of Experience at the user side. The MSP platform has been specifically designed to satisfy the service requirements of live event scenarios, e.g. marathon race. In particular, it collects pre-formatted multimedia data (image, video, etc.), achieves HD and 3D video processing and adaptation

Aldo Campi1, Julien Maillard2, Marc Leny3, Rosalba Suffritti1, Massimo Neri1

Field test of SkyMedia HD/3D content augmentation system for immersive media

experiences

Page 2: [IEEE 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) - Dublin, Ireland (2012.05.23-2012.05.25)] 2012 13th International Workshop on

2

Figure 1 Multimedia Service Platform and data path for SkyMedia

services

Figure 2 Example of a 3D aggregation with relevant image (left and right views)

according to the end-user platforms (i.e. HD/3DS TV screen, phones, etc.), and delivers the resulting video streams in an interactive way.

A. Multimedia Service Platform architecture

The MSP along with the overall data path is highlighted in Figure 1. In the first part of the pipeline, video is pre-processed (calibration) and depth maps are extracted and computed. These maps will be mandatory on the client side to enable a relevant and efficient metadata aggregation in 3D space. To this purpose the stereoscopic cameras used are 4 couples of GigEVision Ethernet. Other than the fact that they cost one tenth the price of a broadcast stereo camera, they make it possible to retrieve uncompressed videos over roughly 100m of cables and can be synchronised with an external trigger unit. The MSP is then in charge of the HD/3D processing that is described below, and finally of the video compression before sending the streams over the network.

B. HD/3D core processing Several ordered steps have to be achieved to ensure high quality HD/3D raw streams to be encoded such as stereo/multi-view pre-processing or disparity map computation. HD and 3D processing are one of the most important and innovative part of the SkyMedia project. In the following the set of functionalities embedded in the HD/3D part of the MSP are reported.

1) Stereo/multi-view pre-processing Pre-processing steps have to be performed in order to prepare a stereo couple of images to be efficiently processed by 3D algorithms. It consists in geometrical calibration (distortion, alignment) and correction of differences in the colour of the

image. 2) Disparity map computation Once the captured images are correctly rectified and colour-calibrated, the disparity map estimation, which is the most intensive 3D/HD processing, is performed by the MSP. The depth map estimation enables to know the relative distance of the objects inside the field of view. Thanks to this information, the depth map allows to properly insert synthetic or virtual contents at the right depth in a real view. Various algorithms exist to evaluate the depth maps from a stereoscopic video, however an algorithm based on the stereo processing by semiglobal matching [3][4] has been chosen for SkyMedia implementation because of its good results and its computation speed. This algorithm is fast enough to give results exploitable for real-time rendering of several streams at the same time.

3) HD/3D synthetic content aggregation and 3D scene enrichment During an event, one of the functionality of the MSP is to aggregate in the video heterogeneous metadata such as text, image, etc. provided by different sources (for instance WSAN sensors, other cameras, etc.) at different positions inside the video stream considering the 3 dimensions of space thanks to the disparity map information. It is possible to determine the disparity that an aggregated object must have to be seen at a specific depth in the 3D image and thus to correctly insert it in the left and right images, in Figure 2, an example of such aggregation is reported where the runner's data have been added in the scene behind the hand in the close-up.

Page 3: [IEEE 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) - Dublin, Ireland (2012.05.23-2012.05.25)] 2012 13th International Workshop on

3

4) Video compression for HD/3D Due to the high bandwidth requirements of the HD/3D streams the delivery system is very challenging. Furthermore stringent time constraint, such as real-time, imposes a very careful selection of compression algorithms. The new H.264/MVC codec, released in early 2010, was chosen as standard coded for video encoding [5], it is able to efficiently encode both HD/S3D (Stereoscopic 3D) and multiview streams by means of dedicated and high performance hardware.

III. TRIAL SCENARIO DESCRIPTION The first experiments, led in the Turin Marathon 2011, was planned to perform several independent tests of the subsystems that will have to be fully integrated for the final demonstration at the 2012 Marathon. Therefore, less cameras were used, the interactivity was limited and the VoD service was not fully available. This section describes the real testbed used.

A. Testbed description The planned tests for the Turin Marathon 2011, see Figure 3, can be divided into four main subtests: video and metadata collection, metadata processing and fusion, content delivery, enriched interactive S3D rendering:

Figure 3 Planned tests description

1) Data collection Two different type of data are collected in this field test, HD/3D video streaming from stereoscopic cameras along the marathon’s stage and path and runner’s metadata information such as heart rate, GPS position, speed and number of steps during the race.

Two rigs of stereoscopic cameras are in charge of the stereo acquisition. They are deployed on the starting line and the finish line of the Turin Marathon both located on the Piazza Castello, in Turin, see Figure 4, the cameras are synchronized thanks to a dedicated external trigger. Two position sensors are attached on the top of the rigs in order to retrieve their respective position and their orientation.

Figure 4 Rigs deployment on the Turin’s Piazza Castello

Some offline S3D videos are captured from the car that opens the race. This content will be stored as metadata for the next Turin Marathon, this task is fulfilled by two stereo camera (i.e. GoPro [6]) stereo systems stuck on the front and the rear of the assistance vehicle. Offline video streams were also planned to be shot from the UAV and retrieved after landing. All the live and offline videos are stored in a server that plays the role of a multimedia database. Thus offline video can be used as metadata to enhance live video, and live video can be stored to be reused in the same way and in the next Marathon. Concerning the runner’s sensors, it consists in a smartphone with some biometrical sensors attached. During the race all the information recorded by the sensors is transmitted in real-time to the MSP to be processed and aggregated in the video stream of the HD/3D camera. 2) Data processing Data processing during the tests can be divided into three main phases. The first one concerns the S3D live acquisition, the video streams must be processed in a specific way to be able to be efficiently augmented and enhanced. Thus geometrical and colour correction is performed and a disparity map is estimated from each stereo couple. In facts, for the end-user comfort, the 3D rendering quality has to be optimal: this means that the depth maps used for the aggregation have to be precisely estimated. This is very CPU-time consuming and the algorithms have been highly optimized (multithreading, subsampling) to find the best compromise between real-time processing and quality. The second phase deals with the metadata retrieving and fusion. The runner’s sensors are collected to the sensor database exploiting the 3G connectivity. Some runners agreed to facilitate these tests during the Turin Marathon 2011. The last step concerns the metadata aggregation inside the S3D videos in order to create augmented reality. This live augmentation is possible thanks to the fusion of stereo images

Page 4: [IEEE 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) - Dublin, Ireland (2012.05.23-2012.05.25)] 2012 13th International Workshop on

4

with a real-time efficient aggregation of synthetic content. This process is based on the interpretation of the disparity map extracted in the first phase. 3) Delivery By the end of the project, a full Multiview Video Coding (MVC) implementation is targeted with inter-view prediction. However, the testbed for these first experiments was not as ambitious, and the stereoscopic streams were encoded independently. An 8 Mbit/s bitstream was used for each stereoscopic camera (two 4 Mbit/s SD streams synchronised). The current multimedia service platform is encoding in real time two stereoscopic streams (i.e. here 4 SD streams) using H.264 AVC. The network delivery is achieved using RTP streams and later, an RTSP downlink will make it possible to request specific metadata (runner’s bio, performance, etc.). 4) Rendering Rendering is undertaken by a dedicated client that manages the multi-view decoding and the correct 3D display. Specific screens and associated shutter glasses are set on the SkyMedia booth.

Figure 5 setup of S3D cameras in the marathon arrival line

IV. EXPERIMENTAL ACTIVITIES AND TEST FILED RESULTS This chapter shows tests and results with the aforementioned testbed during the Turin Marathon 2011 event. 1) Data collection The entire live S3D video acquisition structure is validated by means of perfectly synchronized cameras installed close to the arrival line, Figure 5. However it has been noticed that the outdoor constraints, especially the global illumination, can be very disturbing for the industrial purpose cameras. Therefore, a better management of the camera placement and a better weather consideration is required for the next installation. The sensor deployment and data transmission from runners has been also validated, although some problems of GPS signal have been encountered. Concerning the UAV, it was not able to fly because we didn’t have enough time to apply for flight authorisation to Italian Civil Aviation Authority (ENAC ). However, the video acquisition link has been tested on ground before the race. During the next marathon race the deployment will be devoted to test a larger set of cameras and sensors during the race. 2) Data processing No major problems were met at this level. It can be noticed that geometrical correction of the S3D cameras was almost absent and the resulting 3D effect was as expected, however 3D parameters such as camera spacing, convergence, etc. will

be better tuned taking into account the live scene: The 3D budget [7] will be more accurately defined taking into account the geometrical properties of the scene, the stereo acquisition rigs and the rendering screens size. 3) Delivery The two stereoscopic streams were broadcasted through the network using both unicast and multicast streams to meet the bandwidth need for the large set of upcoming new terminals. During the race, the aggregation between video and metadata was performed on the server side to ease the processing phase, the last step was video compression achieved before the network transmission. The videos were streamed to the rendering computer by means of RTP encapsulation. 4) Rendering The streams were retrieved, decoded and fed to the dedicated software module that prepare the video for the 3D display interfaces. The current 3D display solution tested was based on the NVidia solution using NVidia 3D Vision. Rendering on the shutter based screen worked well, but this technology is highly dependent on the illumination conditions and some lights must be avoided, for example the neon lights that involves some blinking effects for the audience. In addition the glasses are quite heavy, a good solution to settle these problems could be the use of auto stereoscopic displays.

V. CONCLUSION The paper has presented some results of the first public

event demonstration conducted in the 2011 Turin Marathon race. Mainly we have tested some constitutive parts of the SkyMedia MSP architecture paying attention to test each single building block. Specific numeric results were not included in this paper because the goal of this primary field test was to test the system in a complex scenario and not to tune the performance. To this purpose only the description of the results have been included. In the next work we will provide numerical and graphical details about the single performance of the tested MSP blocks.

REFERENCES [1] M. Neri, A. Campi, R. Suffritti, F. Grimaccia, P. Sinogas, O. Guye, C.

Papin, T. Michalareas, L.Gazdag, I. Rakkolainen, SkyMedia - UAV-based capturing of HD/3D content with WSN augmentation for immersive media experiences, ICME, July 2011

[2] Turin Marathon, www.turinmarathon.it [3] D. Sharstein, R. Szelisky, “A Taxonomy and Evaluation of Dense Two-

Frame Stereo Corre-spondence Algorithms”, INTERNATIONAL JOURNAL OF COMPUTER VISION (2001)

[4] H. Hirschmüller, “Stereo Processing by Semiglobal Matching and Mutual Information”, IEEE Transactions on pattern analysis and machine intelligence, vol. 30, NO.2, February 2008

[5] A. Vetro, T. Wiegand, G.J. Sullivan, “Overview of the stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard”, Proceedings of the IEEE, Vol.99, Issue 4, pp 626-642, April 2011.

[6] GoPro stereo camera, www.gopro.com [7] Bernard Mendiburu, 3DTV and 3D Cinema (Focal Press, 2012), p. 170.