mobileasl: making cell phones accessible to the deaf community anna cavender richard ladner, eve...

39
MobileASL: Making Cell Phones Accessible to the Deaf Community Anna Cavender Richard Ladner, Eve Riskin University of Washington

Post on 22-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

MobileASL: Making Cell Phones Accessible to the

Deaf Community

Anna CavenderRichard Ladner, Eve Riskin

University of Washington

2

Two Themes

• MobileASL

• Cyber-Community for Advancing Deaf and Hard of Hearing in STEM (Science Technology Engineering and Math)

3

Challenges:• Limited network bandwidth• Limited processing power on cell phones

Our goal:• ASL communication using video cell phones over

current U.S. cell phone network

4

Cell Phone Network Constraints• Low bit rate goal

– GPRS (General Packet Radio Service)• Ranges from 30kbps to 80kbps (download)• Perhaps half that for upload• Unpredictable variation and packet loss

• 3G = 3rd Generation– Special service– Not yet widespread– Will still have congestion

• Service providers more likely to offer services if throughput can be minimized.

5

MobileASL Network Goals

• Sign language presents a unique challenge:– Not just appearance of video, intelligibility too!– If it works for sign language, other video applications

benefit too.

• MobileASL is about fair access to the current network– As soon as possible, no special accommodations– Not geographically limited– Lower bitrate + power savings = more accessible

6

Architecture

Camera

Encoder

Transmitter

Sender

Player

Decoder

Receiver

Receiver

Cell Phone Network

Cell phone

Encoder

7

Codec Used: x264

• Open source implementation of H.264 standard– Doubles compression ratio over MPEG2 – Replacing MPEG2 as industry standard

• x264 offers faster encoding

• Off-the-shelf H.264 decoder can be used– (speculation about H.264 on the iPhone)

8

Outline

• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study• Compression Challenges• Current Work• Conclusions

9

Discussions with Consumers

Open ended questions:– Physical Setup

• Camera, distance, …

– Features• Compatibility, text, …

– Scenarios• Lighting, driving, relay services, …

10

Consumer Response

• “I don’t foresee any limitations. I would use the phone anywhere: the grocery store, the bus, the car, a restaurant, … anywhere!”

• There is a need within the Deaf Community for mobile ASL conversations

• Existing video phone technology (with minor modifications) would be usable

11

Video Encoding for ASL

• Constraints of cell phone network create video compression challenges

• How do we compress ASL video to maximize intelligibility?

12

Outline

• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study • Compression Challenges• Current Work• Conclusions

13

Eyetracking Studies

• Participants watched ASL videos while eye movements were tracked

• Important regions of the video could be encoded differently

* Muir et al. (2005) and Agrafiotis et al. (2003)

14

Eyetracking Results

• 95% of eye movements within 2 degrees visual angle of the signer’s face (demo)

• Implications: Face region of video is most visually important– Detailed grammar in face requires foveal vision– Hands and arms can be viewed in peripheral vision

* Muir et al. (2005) and Agrafiotis et al. (2003)

15

Outline

• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study• Compression Challenges • Current Work• Conclusions

16

Mobile Video Phone Study

• 3 Region-of-Interest (ROI) values• 2 Frame rates, frames per second (FPS)• 3 different Bit rates

– 15 kbps, 20 kbps, 25 kbps

• 18 participants (7 women)– 10 Deaf, 5 hearing, 3 CODA*– All fluent in ASL

* CODA = (Hearing) Child of a Deaf Adult

17

Example of ROI

Varied quality in fixed-sized region around the face

• (demo)2x quality in face 4x quality in face

18

Examples of FPS• Varied frame rate: 10 fps and 15 fps• For a given bit rate:

Fewer frames = more bits per frame

• (demo)

19

Questionnaire

20

User Preferences Results

1

2

3

4

5

15 kbps 20 kbps 25 kbps 10 fps 15 fps 0 roi 6 roi 12 roi

Type of Encoding

Av

era

ge

Pa

rtic

ipa

nt

Re

sp

on

se

Bit Rate Frame Rate Region of Interest

21

Implications of results

• A mid-range ROI was preferred– Optimal tradeoff between clarity in face and

distortion in rest of “sign-box”

• Lower frame rate preferred– Optimal tradeoff between clarity of frames and

number of frames per second

• Results independent of bit rate

22

Outline

• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study• Compression Challenges • Current Work• Conclusions

23

Rate, distortion and complexity optimization

H.264 H.264 encoderencoderH.264 H.264

encoderencoder

Inputparameters

Raw video

Compressed video

• H.264 standard provides 50% bit savings over MPEG 2, but with higher complexity.

• Objective: Achieve best possible quality for least encoding time at a given bitrate

24

Time – Complexity Tradeoff

Encoding Time

MSE

25

Encoding/Decoding on the Cell Phone

• Implemented a command-line version of x264 on a cell phone using Windows Mobile Edition 5.0.

26

Encoding performance with and without code optimization

0

2

4

6

8

10

12

highsettings

mediumsettings

lowsettings

lowestsettings

fra

me

s p

er

se

co

nd

(a

ve

rag

e)

Wireless MMXOptimized

Unoptimized

QVGA 320x240

27

Outline

• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study• Compression Challenges• Current Work• Conclusions

28

Dynamic Region-of-Interest

• Skin detection algorithms

• Region-based metric for

bit allocation– Automatically determine

priority for face and hands based on currently available bitrate.

29

Activity Recognition• Can save data and power by detecting:

– Fingerspelling• Increase frame rate for better intelligibility

– Signing• Sign language-specific encoding

– “Just listening”• Less processing and transmission needed

• (demo)

30

User Interface• Leverages users’ prior experience with video

conferencing interfaces (such as Sorenson, HOVRS, etc.)

• Optimized for small screen space

Initial state user interface•Incoming Video Stream•Outgoing Video Stream•Control Toolbar•Toggle Privacy Mode•Toggle Chat View•Video Screen Layout•Toggle Status Bar

31

Building the System

32

MobileASL Team

• Principal Investigators – Richard Ladner, Eve Riskin, and Sheila Hemami (Cornell)

• Graduate Students– Anna Cavender, Rahul Vanam, Neva Cherniavsky,

Jaehong Chon, Dane Barney, Frank Ciaramello (Cornell)

• Undergraduate Students– Omari Dennis, Jessica DeWitt, Loren Merritt

• National Science Foundation

Cyber infrastructure for Advancing Deaf & Hard of Hearing in STEM

Richard Ladner* Jorge Díaz-Herrera^ James J DeCaro^+ E William Clymer^+ Anna Cavender*

University of Washington* Rochester Institute of Technology+National Technical Institute for the Deaf^

34

• Advancing Deaf and Hard of Hearing people in STEM fields through better access to education.

Our Goal

35

Problems• Deaf students pursuing STEM fields need skilled

interpreters and captioners with specific domain knowledge.– The best interpreter may not be at the student’s

locale.

• Deaf students face challenging classroom environments: multiple sources of information are all visual– “Deaf Whiplash”

• Sign language is growing to include STEM vocabulary– Community consensus is required.

36

Enabling Access to STEM Education

37

Enabling ASL to Grow in STEM

38

Summit to Create a Cyber-Community to Advance Deaf and Hard-of-Hearing Individuals in STEM

(DHH Cyber-Community)

• Scheduled for June 2008 – RIT/NTID• Discussion among the many stakeholders:

1. Deaf and hard of hearing students in STEM fields.

2. Faculty and administrators in colleges and universities with a commitment to deaf and hard of hearing students in STEM fields.

3. Interpreters and captioners.

4. Researchers who study sign vocabulary for STEM fields and interpreting and captioning for education.

5. Educational technology researchers.

6. Experts in multimedia and network services that use the national cyberinfrastructure (e.g., AccessGrid).

7. Companies already in the business of providing video relay interpreting (VRI) and real time captioning (RTC).

8. Leaders in organizations who have an interest in advancing deaf and hard of hearing students in STEM fields.

• Contact us if you have ideas for participation.

39

Questions?

Thanks!

MobileASL Webpage:www.cs.washington.edu/research/MobileASL

Richard Ladner:[email protected]

Anna Cavender:[email protected]