live vr streaming and its challenges - skku
TRANSCRIPT
1 © 2020 SKKU. All rights reserved.
Live VR Streaming and Its Challenges차세대 미디어 처리 및 전송기술ICT Convergence Korea 2020
July. 2020
Eun-Seok Ryu(류은석 [email protected])
Jong-Beom Jeong, Inae Kim, Soonbin Lee, Sungbin Kim
Multimedia Computing Systems Lab(MCSL)
http://mcsl.skku.edu
Department of Computer Education
Sungkyunkwan University (SKKU)
2 © 2020 SKKU. All rights reserved.
Immersive Media - 360 Virtual Reality
• 360° as part of “10 Breakthrough Technologies (MWC)”
• Inexpensive cameras that make spherical
images are changing the way people to
share stories (Mobile World Congress).
• Next-generation real-world media to
overcome space and time constraints and
provide viewers with a natural, 360-
degree, fully immersive experience.
VR Media Hologram3-Dimensional VideoStereo Video
* Source from Prof. Jewon Kang (Ewha Womans University)
3 © 2020 SKKU. All rights reserved.
Virtual Reality in Market
• The augmented and virtual reality
(AR/VR) market amounted to a forecast of
29.5 billion U.S. dollars in 2020 and is
expected to expand drastically in the
coming years.
• Virtual reality technology is being applied
not only in games but also in various
industries, including medical services,
school education, military training,
vocational training and broadcasting of
large-scale concerts.
Military training Vocational trainingSchool educationMedical service
4 © 2020 SKKU. All rights reserved.
360 Video Processing (1/2)
Image stitching,Equirectangular
mapping
Video rendering on a sphere
Video decoding
Video encoding
Stitched video is coded as regular 2D video by using H.264/AVC and H.265/HEVC
Input from seven cameras
Corresponding stitched image
Stitched image texture mapped onto a sphere
5 © 2020 SKKU. All rights reserved.
Technologies of Virtual Reality
• 6 Parts of VR Technologies
• System / Acquisition / Pre&Post-processing / Coding / Streaming / Assessment
System Architecture for Audiovisual Media with 6 Degrees of FreedomSource: Mary-Luc Champel, Rob Koenen, Gauthier Lafruit, Madhukar Budagavi, “ Proposed Draft 1.0 of TR: Technical Report on Architectures for Immersive Media”, document n17685, 122nd MPEG
meeting of ISO/IEC JTC1/SC29/WG11, 2018.
System
Acquisition
Pre, post-processing
Coding
Streaming
Assessment
Ass
ess
me
nt
6 © 2020 SKKU. All rights reserved.
Challenges (High BW, Low Latency, Sickness)
• High Bandwidth Requirement of VR
• Requires 40 pix/deg, 12K resolution for High quality VR
• To avoid the sickness, 90 fps and 20 ms MTP are required
• Immersive video contains texture (color) and depth (geometry) (×2)
• Also, immersive video has high quality (nearly 4K) multiple views (×N)
> needs High BW (e.g. 5G, mmWave)
Requirements for high quality VR
Requirement details
Pixels/degree 40 pix/deg
Resolution 11520x6480 (12K)
Framerate 90 fps
Motion-to-photon-
latency20 ms
Source: Mary-Luc Champel, Thomas Stockhammer, Thierry Fautier, Emmanuel Thomas, and Rob Koenen. “Quality Requirements for VR”, document MPEG116/m39532, 116th MPEG
meeting of ISO/IEC JTC1/SC29/WG11.
Sequence Resolution No. of views Frame count
ClassroomVideo 4096x2048 15 120
TechnicolorMuseum 2048x2048 24 300
TechnicolorHijack 4096x4096 10 300
TechnicolorPainter 2048x1088 16 300
IntelKermit 1920x1080 13 300
Characteristics of immersive video
7 © 2020 SKKU. All rights reserved.
Immersive Media Standard Roadmap
Video with 6 DoF
Geometry Point Cloud Compression
OMAF v.2
2018 20202019 2021
Internet of Media Things
Descriptors for Video Analysis (CDVA)
6 DoF Audio
Video Point Cloud Compression
MediaCoding
2022
Immersive Media Scene Description Interface
Versatile Video Coding
2023
Systems and Tools
Web Resource Tracks
Dense Representation of Light Field Video
3DoF+ Video
Neural Network Compression for Multimedia
Essential Video Coding
Low Complexity Enhancement Video Coding
PCC Systems Support
Media Orchestration
Network-Based Media Processing
BeyondMedia
Genome Compression ExtensionsGenome Compression
Point Cloud Compression v.2
CMAF v.2Multi-Image Application Format
2024
Color Support in Open Font Format
Partial File Format
Immersive
Media
Standards
Now
VR 360
8 © 2020 SKKU. All rights reserved.
Immersive Media Standard Projects
New MPEG project: ISO/IEC 23090
Coded Representation of Immersive Media
8 parts:
1. Architectures for Immersive Media (Technical Report)
2. Omnidirectional Media AF
3. Versatile Video Coding
4. 6 Degrees of Freedom Audio
5. Video Point Cloud Coding (V-PCC)
6. Metrics for Immersive Services and Applications
7. Metadata for Immersive Services and Applications
8. Network-Based Media Processing
9. Geometry Point Cloud Coding (G-PCC)
9 © 2020 SKKU. All rights reserved.
Immersive Media Standard
• Step-by-step objective of ISO/IEC MPEG Immersive Video• MPEG-I is responsible for standardizing immersive media in MPEG and specifies
the goals of step 3.
• Goal of Revitalizing VR Commercial Service by 2020
• Goal of 6DoF media support by 2022 after completing 3DoF standard by 2017
Step 1 Step 2 Step 3
Yaw
Roll
Pitch
Yaw
Roll
Pitch
Right
Left
Forward
Backward
Up
Down
Up
YawForward
Roll
Right
PitchDown
Backward
Left
• Complete 3DoF standard by 2017
• Rotate head in a fixed state
• 360 video full streaming by default
• Tiled streaming if possible
• Enable VR commercial servicers by 2020
• Allow head rotation and movement
within a restricted area
• User-to-user conversations and projection optimization
• Support 6DoF by 2022
• 6DoF video will reflect user’s walking
motion
• Support interaction with virtual environments
Three phases of virtual reality defined by MPEG-I
10 © 2020 SKKU. All rights reserved.
360-Degree Video with 4K Tiles
• Equirectangular Projection (ERP) format and 4K Tile
*Picture source: KETI 김용환박사발표자료
11 © 2020 SKKU. All rights reserved.
Viewport-adaptive v.s. Viewport Independent
• Viewport Independent• Transmit whole picture with pre-processing
• Projection and packing
• Downsampling / adjusting QP
• Viewport-dependent (Adaptive)• Transmit viewport only
• Bitrate saving
• But, delay
• Bitrate savings over sending without pre-
processing
• Compression method focused on region of
interest
• Consider several cases to extract areas, that
shows poor encoding efficiency
• Experience greater visual quality with lower
bandwidth consumption*
Field of View (FOV)
12 © 2020 SKKU. All rights reserved.
MPEG-I 3DoF System Architecture
• MPEG-I defined general framework for 3DoF system (n17685)
• Project conversion and down-sampling are applied (e.g. ERP)
• Head/eye tracking information is required to render through HMD
Block diagram for 3DoF System
Acquisition EncodingStitching,
projection
conversion
(Optional)
Down-
sampling
File/segment
encapsulation
Delivery
Head/eye
tracking
File/segment
decapsulationDecodingRenderingDisplay
Metadata
Orientation/viewport
metadata
Metadata
(Optional)
Up-sampling
Orientation/viewport metadata
EAPERP ISP OHP SSPCMP
13 © 2020 SKKU. All rights reserved.
Virtual Reality Streaming Technologies
• 360-degree Video Streaming• RTSP/RTP
• MPEG-DASH
• Viewport Adaptive Streaming• Motion Constrained Tile Sets (MCTS)
Ensure
independence
between tiles
…𝑡0 𝑡1 𝑡2
Motion Constrained Tile Sets
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Viewport-adaptive decoding and
rendering
Viewport
Tiles
viewport
1080p
720p
360p
z
x
y
14 © 2020 SKKU. All rights reserved.
(Viewport-Adaptive) Tiled Streaming (Demo)
Original Bitstream (DrivingInCity / 3840×1920 / Uniform 3×3 9Tiles)
Extracted Bitstream
(DrivingInCity / 1280×640 (3840×1920) / 1Tile (Texture))
• Extract Encoded Bitstream
Extracted Bitstream Rendering
(DrivingInCity / 4Tile)
• Four Tile Rendering
15 © 2020 SKKU. All rights reserved.
3DoF+ System Architecture
• MPEG-I defined general framework for 3DoF+ S/W
• Applies codec-independent pre and post-processing structure
• Removes the correlation between input videos
• Small number of decoders
• Basic view / Additional view / Pruning / Packing
Block diagram for 3DoF+ S/W platform
View
Optimizer
…
360 cameras array
(T+D)
Pruner
Patch
Packer
HEVC
Encoder
HEVC
Decoder
HEVC
Encoder
HEVC
Decoder
HEVC
Decoder
Metadata Parser
Occupancy Map
Generator
Renderer
Viewport
Rendering
Metadata
Video Server Video Client
Metadata
Network
Pre-processing Post-processing
Basic views (BV)
atlases
Additional views (AV)
atlases AV atlases
BV atlases
16 © 2020 SKKU. All rights reserved.
Classroomvideo sequence (7680x2560)
Results of Stride Packing for 3DoF+ by MCSL
17 © 2020 SKKU. All rights reserved.
6DoF Standard: Point Cloud Coding
• Point cloud coding (PCC) compresses 3D points
• PCC consists of two parts:
video-based PCC (V-PCC), geometry-based PCC (G-PCC)
• V-PCC compresses point clouds using 2D video encoder
• Patches are generated from 3D point clouds, packed into 2D spaces
Source: Vladyslav Zakharchenko, “Algorithm description of mpeg-pcc-tmc2”, document MPEG2018/n17526, 122nd MPEG meeting of ISO/IEC JTC1/SC29/WG11, 2018.
Patch generation,
packing Video encoding
Example of patch generation, packing of point cloud contents
18 © 2020 SKKU. All rights reserved.
6DoF Standard: Point Cloud Coding (Cont’d)
• G-PCC uses a new codec which is suitable for 3D point clouds
• On committee draft (CD) stage
• In loseless compression, G-PCC has 20% gain compared to V-PCC
Overview of the G-PCC encoder (left) and decoder (right)
Source: Khaled Mammou, Philip A. Chou, David Flynn, Maja Krivokuća, Ohji Nakagami and Toshiyasu Sugio, “G-PCC codec description v1”, document n18015, 124th MPEG
meeting of ISO/IEC JTC1/SC29/WG11, 2018.
19 © 2020 SKKU. All rights reserved.
6DoF Standard: Plenoptic / Multiview coding
• Dense light field (DLF) was included into MPEG-I
• MPEG-I Visual defined common test conditions (CTC) for DLF
• Exploration experiments are on progress
• Conversion from lenslet to multiview is possible
• Arrangement with MPEG-I Visual reference SW will be done
(performance evaluation will be conducted)
Example of lenslet and multiview dense light field video
Colored lenslet
video dataMultiview video data
Resolution: 4088×3068Resolution of each view:
1147×830
Color: 24 bits, BMPViews: 5×5
Frame rate: 30 fps
Number of frames: 300 Number of frames: 300
…
…
……
Characteristics of lenslet and multiview
dense light field video
20 © 2020 SKKU. All rights reserved.
6DoF Standard: MPEG-I Visual
• MPEG-I requested call for proposals (CfP) on 3DoF+
• Philips, Intel & Technicolor, Nokia, PUT & ETRI, ZJU submitted responses
> test model for immersive video (TMIV)
• Enhancing TMIV for compressing 6DoF videos
• TMIV will be arranged with V-PCC
Block diagram of TMIV architecture
Viewport generated by TMIV
21 © 2020 SKKU. All rights reserved.
Class
(M/O)
Sequence name
ResolutionThumbnail Type
No.
of
views
Depth
range
Frame
count
Frame
rateBit depth
CG1-A
(M)
ClassroomVideo
4096x2048ERP 15
[0.8m,
inf]120 30fps
Texture: 10,
Geometry: 16
CG1-B
(M)
TechnicolorMuseum
2048x2048ERP 24
[0.5m,
25m]300 30fps
Texture: 10,
Geometry: 16
CG1-C
(M)
InterdigitalHijack
4096x4096ERP 10
[0.5m,
25m]300 30fps
Texture: 10,
Geometry: 16
CG1-N
(O)
NokiaChess
2048x2048ERP 10
[0.1m,
500m]300 30fps
Texture: 10,
Geometry: 16
CG2-J
(M)
OrangeKitchen
1920x1080Perspective
25
(5x5)[2.2, 7.2] 97 30fps
Texture: 10,
Geometry: 10
NC1-D
(M)
TechnicolorPainter
2048x1088Perspective
16
(4x4)- 300 30fps
Texture: 10,
Geometry: 16
NC1-E
(M)
IntelFrog
1920x1080Perspective
13
(13x1)[0.3, 1.62] 300 30fps
Texture: 10,
Geometry: 16
NC2-L
(M)
PoznanFencing
1920x1080Perspective 10 [3.5, 7.0] 250 25fps
Texture: 10,
Geometry: 16
MPEG-I immersive video test sequences
Immersive Video Test Sequences
22 © 2020 SKKU. All rights reserved.
Tiled Streaming with 6DoF 360 Videos by MCSL
• Developed viewport tile selector (VTS) on 6DoF
• Compatible with TMIV and HEVC (or any other codecs, e.g. VVC)
• User viewport tiles (HQ) + entire videos (LQ) simulcast streaming
> low delay and bandwidth adaptive streaming
• In ClassroomVideo test sequence, 19.40% gain was observed
Viewport tile selector based tiled streaming
23 © 2020 SKKU. All rights reserved.
Conclusion
• 360 video streaming for VR is emerging
• Requires high BW for reducing motion sickness
• >> Tile-based viewport dependent solution!
• MCTS (Motion-Constrained Tile Set)
• EIS (Extraction Information Sets)
•Updates VPS, SPS, and PPS considering selected viewport tiles
• Implemented solution saves BW significantly
• Contributed to JCT-VC HM with Fraunhofer HHI
• Collaborates with UCSB for intercontinental VR streaming at VR theater
(Allosphere)
• 2D texture and depth based 3DoF+/6DoF solution is implemented for demo
Thank You !Questions > [email protected]