communications, collaboration, and community
DESCRIPTION
Communications, Collaboration, and Community. Anoop Gupta Microsoft Research Collaborators: Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others. Deployment-Driven Multidisciplinary Research: Challenges and Opportunities. Anoop Gupta - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/1.jpg)
1
Communications, Collaboration, and Community
Anoop Gupta
Microsoft Research
Collaborators:
Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others
![Page 2: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/2.jpg)
2
Deployment-Driven Multidisciplinary Research:
Challenges and Opportunities
Anoop Gupta
Microsoft Research
Collaborators:
Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others
![Page 3: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/3.jpg)
3
Collaboration and Multimedia Group
• 16 people – 9 Researchers, 5 R-SDEs, 1 Designer, 1 Usability– Diverse: Systems, Cog Psych, Sociologist, Vision, Graphics
• Focus:– Peripheral awareness and people-centric interfaces– Tele-presentation and tele-meeting technologies– Make audio-video information a first-class citizen– Enhanced online communities
=>Technologies, Applications, and Social Factors
![Page 4: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/4.jpg)
4
• Peripheral awareness and people-centric interfaces– How do we stay aware of relevant information
without annoying notifications– How do we stay aware of people, communicate
with them, and bring them to the front of the user interface
– How can we leverage technology to provide a better idea of people/environment state
![Page 5: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/5.jpg)
5
• Tele-presentations and tele-meetings– Leverage the combination of
• cheap sensors (cameras, microphones, …),
• cheap computing power, bandwidth, and storage,
• Advances in vision-graphics-SP technologies
– Convincing remote presence and interactivity– Whiteboard, note-taking, local interaction tools– High quality recording and archiving– Rich indices and browsing support
![Page 6: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/6.jpg)
6
• Make audio-video information a first-class citizen– Low-cost and high-quality capture– Automatic index creation and highlights– Rich support for annotation and collaboration– Browsing tools and interfaces
![Page 7: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/7.jpg)
7
• Enhanced online communities– Tracking Interaction / Social History
– Incentive Structures• Encourage high quality content creation• Encourage interaction• Discourage inappropriate behavior
– Filtering and Synopsis
– Community Portals
![Page 8: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/8.jpg)
8
Outline
• Our group
• Research approach
• Project samplings– Office activity modeling– Distributed meetings– Tele-presentations– Face modeling
• Concluding Remarks / Challenges
![Page 9: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/9.jpg)
9
Research Approach
• Deployment-driven research– End-users vs. other researchers as main customer– Robustness vs. Functionality– Multiple sensor technologies with graceful degradation– Value existing infrastructure– Simplicity of set-up and operation– Design with end-user in the loop– Field evaluations
• Multi-disciplinary tool-set
BuildPrototype
Evaluation /Publication
RefinePrototype
ProductImpact
![Page 10: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/10.jpg)
10
• Uses of Office Awareness– Intelligent messaging
• Send messages on appropriate channel – instant message, office phone, e-mail, mobile, etc.
– Intelligent instant messaging• Stopped typing = not there
– Peripheral awareness for “buddies” • Is now a good time to drop by Jack’s office?
1. Office Activity Modeling(joint with ASI group at MSR)
![Page 11: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/11.jpg)
11
So how does the deployment-driven approach impact our decisions?
![Page 12: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/12.jpg)
12
• Environment– Office with door (w/ window); Cubicle; Open plan; …
• Number of people – (0 / 1+) | (0 / 1 / 1+) | (0/1/2/3/…)
• Gross activity– At desk; On PC ; On phone; In meeting; …
• Fine activity– Who are the people present– Reading; Answering mail; …
• Activity Trends– Usually comes in at 7am, leaves at 5pm– Never comes in on weekends
• …
Environment and Outputs
![Page 13: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/13.jpg)
13
• Keyboard / Mouse• Calendar (appointment schedule)• Desktop microphone • TAPI-enabled phone (VoIP)• Desktop camera
• Other:– Motion detector, high-quality microphone / headset;
bird’s-eye camera; laser/IR gates;thermal cameras etc.
Sensors
![Page 14: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/14.jpg)
14
• Use reliable sensors as much as possible
• Use reliable sensors to label data for other sensors
• For vision, stick to reliably extractable, robust cues (e.g., presence of motion, optic flow)
• “Quasi-supervised” learning, using data labeled as above
Making the Inferences… in increasing approximate expected order of research interest
![Page 15: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/15.jpg)
15
• Eve/Priorities project at MSR (ASI)– Integrates capture of features (keyboard/mouse use, app use,
vision, audio events,…)– Language for combining low-level features– Bayesian fusion– Vision component can determine whether person is facing front or
not, but still not as robust as desired
• Current work in quasi-supervised learning of low-level features…
Hope to deploy base versions in summer
Results
![Page 16: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/16.jpg)
16
Concatentation of 3 sections of low-level vision data only, sampled from 8-hour log
Unsupervised clustering segments sections cleanly.
Results(preliminary)
![Page 17: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/17.jpg)
17
Results(preliminary)
Correlates with high keyboard/mouse activity, no speechGround truth: 1 person at monitor
![Page 18: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/18.jpg)
18
Benefits and Challenges
• Benefits– Prioritizing problems and context
– How far we need to push the solution
– Earlier benefits for end-users; enables social science research
• Drawbacks– Need substantial engineering (plus algorithmic) skills
– Need multidisciplinary team
![Page 19: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/19.jpg)
19
2. Distributed Small Group Meetings
• Scenario:– Imagine 8-10 people– In conference room, from desktops, mobile– Rich back and forth interaction– Archival and browsing support
![Page 20: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/20.jpg)
20
Contextualized Research Challenges
• Novel camera, microphone, display systems• Speaker tracking; multi-person tracking• Gaze and pose correction• Activity tracking and gesture recognition• Graphical avatars and virtual environments• Real and virtual camera management• Automated indexing and browsing support• Integration of handheld devices• User interface / User experience
![Page 21: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/21.jpg)
Meeting environmentOmni-directional camera
An example omni image
360-degree panorama viewF
irst
P
roto
typ
e
![Page 22: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/22.jpg)
22
Second Prototype
• Cost $300 vs. $10K
• Much better quality ~3000 x 500 pixels
• All processing done on the PC
![Page 23: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/23.jpg)
23
Remote Interfaces
• All-up• Computer controlled• User controlled• User + Computer + Overview
![Page 24: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/24.jpg)
24
Short/Medium Term Plan• Cameras, Calibration, Stitching
– Camera design to minimize parallax– Automatic camera calibration– Real-time on today’s processors
• Speaker detection and multiple-person detection– Microphone array sound source localization– Computer vision tracking of multiple people– Fusing A/V for better speaker detection
• Simple remote participation interface• Automatic camera management• Video compression, storage, and transmission• Automatic index creation and meeting browsing
Expect to deploy in a few conference rooms during summer
![Page 25: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/25.jpg)
25
3. Tele-Presentations
• Enable people to – Easily broadcast/capture lectures (speaker and audience)– Esthetically pleasing– Participate from remote locations
• Solution components– Tracking cameras, microphone arrays, …– Video production rules from professionals– Mapping of rules to cameras and software video director– Remote presence and interactivity system (TELEP)
• First prototype being used in the small lecture room at MSR
![Page 26: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/26.jpg)
![Page 27: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/27.jpg)
27
Key Modules
• Speaker tracking and audience tracking– Computer-vision-based tracking
– Microphone-array-based tracking
![Page 28: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/28.jpg)
28
Key modules (cont)
• Virtual video director (FSM)– Maintain min shot duration
– Dynamic max shot duration• Function of shot quality• Triggers TIME_EXPIRE event
– Monitoring status change• Triggers STATUS event
– Encode editing knowledge into transition probabilities
![Page 29: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/29.jpg)
29
Initial Deployment Results
• Tested concurrent human operator and our system– Field study
– Lab study
• Results:– Human operator better, but difference is not statistically
significant
– People could not distinguish which operator was human and which was computer
![Page 30: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/30.jpg)
30
Technical Challenges• Design and configuration of camera/m-phone systems
• More robust lecturer tracking– Smooth tracking in close-up shots– Multiple lecturers– Lecturers move into the audience area
• More robust audience tracking– Background noise and room reverberation
• More sophisticated rules and knowledge– Human operators have much better ability to deal with exceptions– A flexible/learning automated camera management system
![Page 31: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/31.jpg)
31
4. Face Modeling
• Technical goals:– Build a realistic-looking face model from video images– The face model can be animated right away– Painless in data acquisition & Efficient in model building– Commodity equipment (computer+camera)– No special requirement on the acquisition condition
(background, lighting, …)
• Uses:– Enhanced chat / gaming environments– Conferencing over low-bandwidth links
![Page 32: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/32.jpg)
32
System Overview
![Page 33: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/33.jpg)
33
Examples
![Page 34: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/34.jpg)
34
Example Application: Virtual Poker
• Designed as a social interface
• Each player controls an avatar
• Some behaviors automatically generated
![Page 35: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/35.jpg)
35I guess it’s my turn
• Players automatically turn to follow action/voice
Virtual Poker
![Page 36: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/36.jpg)
36
Research Challenges
• Teeth, tongue, eyes and hair
• Personalized facial expressions
• Real-time animation driven from video
• Yet more robust and easy to use
![Page 37: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/37.jpg)
37
Outline
• Our group
• Research approach
• Project samplings– Office activity modeling– Distributed meetings– Tele-presentations– Face modeling
• Concluding Remarks / Challenges
![Page 38: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/38.jpg)
38
Concluding Remarks
• Focus on deployment-driven research– Tremendous leverage in:
• Prioritizing problems we explore
• Context we assume while solving
• How far we push the solution
• Earlier benefits for end-users
• Enabling social science research
• Keeping management support Effort Spent
% C
omp
lete
![Page 39: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/39.jpg)
39
– Challenges:• Need more resources (or pursue fewer things)
• Need substantial engineering (plus algorithmic) skills
• Premier conferences do not appreciate engineering aspects
• Not all important research yields to above constraints
– Some solution options:• Community shared infrastructure (environments) into
which things can be plugged (e.g., SUIF for compilers)
• Premier conferences / Senior researchers attitudes
• Funding agency attitudes
![Page 40: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/40.jpg)
40
• Focus on multidisciplinary research– Tremendous leverage in providing:
• More robust solutions (or solutions at all)
• More cost effective solutions
• Getting deployment of research ideas out to end-user and the knowledge from resulting feedback
– Challenges:• Vision, Video, Graphics, Hardware, Speech, SP, …
• Need diversity within the group plus close ties externally
• Need supportive management and funding structure
• Academic departments, lab research groups, conferences, tenure organized around traditional disciplinary boundaries
• Discourages pushing one discipline as hard as possible when another provides an easier answer
![Page 41: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/41.jpg)
41
– Some solution components:
• Strong leaders (e.g., Hennessy – Brought Arch, Compilers, Prog. Lang, OS folks together)
• Premier conferences / Senior researchers attitudes
• Funding agency attitudes
![Page 42: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/42.jpg)
42
Questions / Discussion
• Graphics: What is the killer application in the workplace?
• Vision: How can we identifying the state of the art to a non-expert?
• Are you satisfied with the degree of connection with the end-user/reality in your sub-field?
• What do you think of the role of multi-disciplinary research? Who should do it?
• Do we have balance?
![Page 43: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/43.jpg)
43
• Graphics: What is the killer application in the workplace– We have tried:
• 3D Shell
• 3D Avatars in tele-meetings
• 3D in visualizations, …
• …
– Killer application still eludes us
![Page 44: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/44.jpg)
44
• Vision: Identifying the state of the art– E.g., Speech
• Speaker dependent or independent• Size of vocabulary• Language model / Grammar / Domain• Microphone quality
– What’s the equivalent for vision• How can we characterize / partition / … the space in
a way so that the non-expert knows when/where vision technology can be relied upon
![Page 45: Communications, Collaboration, and Community](https://reader030.vdocument.in/reader030/viewer/2022033107/56814371550346895daff04c/html5/thumbnails/45.jpg)
45
Questions / Discussion