new stuff people were interested in more detailed spatial information about media captures added...
TRANSCRIPT
![Page 1: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/1.jpg)
New stuff
People were interested in more detailed spatial information about media captures Added area of capture and point of capture
attributes Also addresses multi-view use case Offers a more flexible way of associating audio
with video Remove the “linear array” audio type, replaced by
using area of capture
![Page 2: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/2.jpg)
Other topics to consider
Framework has these in appendix to be discussed
VAD (voice activity detection) Media source selection (e.g. from a roster) Composition and switching algorithms
audio and video
![Page 3: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/3.jpg)
Composition/Switching Algorithms
Framework has simple boolean attributes for indicating a Media Capture is switched or composed. Is this enough?
If not, what else do we need? Another use case to make it clear? More detailed indications about exactly how a
capture is switched or composed? Anything else?
Interested people should propose specific additions to the framework
![Page 4: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/4.jpg)
Attributes
EXTENSIBILITY
Audio attributes• Channel Format
Stereo Mono
Video attributes• Spatial scale
Image width
Media Capture attributes• Purpose (role)
Main Presentation
• Mixed – true/false• Auto switched – true/false• Area of Capture - ranges• Point of Capture - point• Area Scale millimeters
![Page 5: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/5.jpg)
Capture Scene
VC0 VC2VC1
VC3 VC4Cameras
People VC1
VC2
VC0
Capture Scene
Three cameras
Two cameras, moved & zoomed out
Switched (based on voice) with composed PiP
VC5
![Page 6: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/6.jpg)
Capture Scene
VC0 VC2VC1
VC3 VC4
VC1
VC2
VC0
xBegin=0xEnd=100
VC5
x = 0
x = 100
x = 200
x = 300
xBegin=100xEnd=200
xBegin=200xEnd=300
xBegin=0xEnd=150
xBegin=150xEnd=300
xBegin=0xEnd=300
x = 150
Area of capture
Point of capture
x = 250
x = 150
x = 50
![Page 7: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/7.jpg)
Capture Set
Each alternative representation of a Capture Scene is a row in a Capture Set
Three cameras
Two cameras, moved and zoomed out
Switched (based on voice), composed PiP
(VC0, VC1, VC2)
(VC3, VC4)
(VC5)
(AC0)
Capture Set Rows VC0 VC2VC1
VC3 VC4
VC5
![Page 8: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/8.jpg)
Video Capture Adjacency
cameras
people
right
leftVC0
VC1
right
left
VC0
VC1
Capture Set:(VC0, VC1)Other capture set rows
x = 0
x = 100
x = 200
x = 0
x = 100
x = 200
x = 100
x = 100
x = 50
x = 150
![Page 9: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/9.jpg)
Example with Field of View 1
xBegin=0
Point of capture = (673,0)
x along straight linexBegin=1446
xEnd=1346
yBegin=3000yEnd=3000
xEnd=2792
Point of capture = (2119,0)
a
Angle a = 2 * arctan ((1346/2) / 3000) = 25.3°
Field of view angle can be calculated from the area of capture and point of capture attributes.
y distance from camera
![Page 10: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/10.jpg)
Example with Field of View 2
xBegin=0
Point of capture = (1396,0)
y distance from camera
xEnd=1346
yBegin=3000yEnd=3000
xBegin=1446
xEnd=2792
a
yBegin=3000yEnd=3000
xalong arc
![Page 11: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/11.jpg)
Matching Audio with Video
Same capture scene Video adjacency matches audio sound stage Rendering side uses Area of Capture
attributes to match the audio with the video
![Page 12: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/12.jpg)
Monox = 0 to 100
Stereox = 0 to 300
Matching Audio with VideoSpatial extent of video
Spatial extent of audio
Left Right
VC0 VC2VC1
x = 0 to 100 x = 100 to 200 x = 200 to 300
Monox = 100 to 200
Monox = 200 to 300
One stereo AC
Three mono ACs
![Page 13: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/13.jpg)
Supporting the use cases
3.1 point to point symmetric Different number of audio channels on each side Different number of video and audio channels Match the sound stage with video display Handle gaps/overlap between captures Audio levels match
![Page 14: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/14.jpg)
Supporting the use cases
3.2 point to point asymmetric Send subset of available streams Allow some user choice Sender does composition into one stream Receiver does composition of multiple streams
onto one display
![Page 15: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/15.jpg)
Supporting the use cases
3.3 multipoint Site switching Segment switching Still need work on VAD Switch based on manual control Composing reduced image sizes (continuous
presence)
![Page 16: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/16.jpg)
Supporting the use cases
3.4 presentation Video/audio streams for presentation Multiple presentation streams
BFCP-like control of multiple streams (not in CLUE scope?)
Consistent placement of multiple streams at each site
![Page 17: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/17.jpg)
Supporting the use cases
3.5 Heterogeneous systems Transcoding middlebox Single or multiple streams Different bit rates Different layout policies
Not settled yet
![Page 18: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/18.jpg)
Supporting the use cases
3.5 Multipoint education Multiple streams with different roles (different
scenes) Placing video on correct screen Still need work on VAD Requesting a stream from a particular site
![Page 19: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses](https://reader036.vdocument.in/reader036/viewer/2022082713/5697c00d1a28abf838cc9904/html5/thumbnails/19.jpg)
Supporting the use cases
3.5 Multipoint multiview Different views of same scene Assigning camera views to remote displays for
best eye contact