new stuff people were interested in more detailed spatial information about media captures added...

19
New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses multi-view use case Offers a more flexible way of associating audio with video Remove the “linear array” audio type, replaced by using area of capture

Upload: morris-kelly

Post on 19-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

New stuff

People were interested in more detailed spatial information about media captures Added area of capture and point of capture

attributes Also addresses multi-view use case Offers a more flexible way of associating audio

with video Remove the “linear array” audio type, replaced by

using area of capture

Page 2: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Other topics to consider

Framework has these in appendix to be discussed

VAD (voice activity detection) Media source selection (e.g. from a roster) Composition and switching algorithms

audio and video

Page 3: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Composition/Switching Algorithms

Framework has simple boolean attributes for indicating a Media Capture is switched or composed. Is this enough?

If not, what else do we need? Another use case to make it clear? More detailed indications about exactly how a

capture is switched or composed? Anything else?

Interested people should propose specific additions to the framework

Page 4: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Attributes

EXTENSIBILITY

Audio attributes• Channel Format

Stereo Mono

Video attributes• Spatial scale

Image width

Media Capture attributes• Purpose (role)

Main Presentation

• Mixed – true/false• Auto switched – true/false• Area of Capture - ranges• Point of Capture - point• Area Scale millimeters

Page 5: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Capture Scene

VC0 VC2VC1

VC3 VC4Cameras

People VC1

VC2

VC0

Capture Scene

Three cameras

Two cameras, moved & zoomed out

Switched (based on voice) with composed PiP

VC5

Page 6: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Capture Scene

VC0 VC2VC1

VC3 VC4

VC1

VC2

VC0

xBegin=0xEnd=100

VC5

x = 0

x = 100

x = 200

x = 300

xBegin=100xEnd=200

xBegin=200xEnd=300

xBegin=0xEnd=150

xBegin=150xEnd=300

xBegin=0xEnd=300

x = 150

Area of capture

Point of capture

x = 250

x = 150

x = 50

Page 7: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Capture Set

Each alternative representation of a Capture Scene is a row in a Capture Set

Three cameras

Two cameras, moved and zoomed out

Switched (based on voice), composed PiP

(VC0, VC1, VC2)

(VC3, VC4)

(VC5)

(AC0)

Capture Set Rows VC0 VC2VC1

VC3 VC4

VC5

Page 8: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Video Capture Adjacency

cameras

people

right

leftVC0

VC1

right

left

VC0

VC1

Capture Set:(VC0, VC1)Other capture set rows

x = 0

x = 100

x = 200

x = 0

x = 100

x = 200

x = 100

x = 100

x = 50

x = 150

Page 9: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Example with Field of View 1

xBegin=0

Point of capture = (673,0)

x along straight linexBegin=1446

xEnd=1346

yBegin=3000yEnd=3000

xEnd=2792

Point of capture = (2119,0)

a

Angle a = 2 * arctan ((1346/2) / 3000) = 25.3°

Field of view angle can be calculated from the area of capture and point of capture attributes.

y distance from camera

Page 10: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Example with Field of View 2

xBegin=0

Point of capture = (1396,0)

y distance from camera

xEnd=1346

yBegin=3000yEnd=3000

xBegin=1446

xEnd=2792

a

yBegin=3000yEnd=3000

xalong arc

Page 11: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Matching Audio with Video

Same capture scene Video adjacency matches audio sound stage Rendering side uses Area of Capture

attributes to match the audio with the video

Page 12: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Monox = 0 to 100

Stereox = 0 to 300

Matching Audio with VideoSpatial extent of video

Spatial extent of audio

Left Right

VC0 VC2VC1

x = 0 to 100 x = 100 to 200 x = 200 to 300

Monox = 100 to 200

Monox = 200 to 300

One stereo AC

Three mono ACs

Page 13: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Supporting the use cases

3.1 point to point symmetric Different number of audio channels on each side Different number of video and audio channels Match the sound stage with video display Handle gaps/overlap between captures Audio levels match

Page 14: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Supporting the use cases

3.2 point to point asymmetric Send subset of available streams Allow some user choice Sender does composition into one stream Receiver does composition of multiple streams

onto one display

Page 15: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Supporting the use cases

3.3 multipoint Site switching Segment switching Still need work on VAD Switch based on manual control Composing reduced image sizes (continuous

presence)

Page 16: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Supporting the use cases

3.4 presentation Video/audio streams for presentation Multiple presentation streams

BFCP-like control of multiple streams (not in CLUE scope?)

Consistent placement of multiple streams at each site

Page 17: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Supporting the use cases

3.5 Heterogeneous systems Transcoding middlebox Single or multiple streams Different bit rates Different layout policies

Not settled yet

Page 18: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Supporting the use cases

3.5 Multipoint education Multiple streams with different roles (different

scenes) Placing video on correct screen Still need work on VAD Requesting a stream from a particular site

Page 19: New stuff People were interested in more detailed spatial information about media captures Added area of capture and point of capture attributes Also addresses

Supporting the use cases

3.5 Multipoint multiview Different views of same scene Assigning camera views to remote displays for

best eye contact