x encontro de alunos e docentes do dca/feec/unicamp … · emely pujólli da silva , paula...

4
X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP (EADCA) X DCA/FEEC/University of Campinas (UNICAMP) Workshop (EADCA) Campinas, 26 e 27 de outubro de 2017 Campinas, Brazil, October 26-27, 2017 QLIBRAS: A novel database for grammatical facial expressions in Brazilian Sign Language Emely Pujólli da Silva , Paula Dornhofer Paro Costa Departamento de Engenharia de Computação e Automação Industrial (DCA) Faculdade de Engenharia Elétrica e de Computação (FEEC) Universidade Estadual de Campinas (Unicamp) Caixa Postal 6101, 13083-970 – Campinas, SP, Brasil [email protected], [email protected] Abstract – Individuals with some degree of hearing impairment typically face difficulties in communicating with hearing individuals and during the acquisition of reading and writing skills. Sign language (SL) is a language struc- tured in gestures that, as any other human language, present variations around the world and that is widely adopted by the deaf. Automatic Sign Language Recognition (ASLR) technology aims to translate sign language gestures into written or spoken sentences of a target language with the goal of improving the communication between deaf and hearing individuals. An important step towards the improvement of ASLR models is the access to comprehen- sive databases of non-manual signs. This paper presents our first approach to build a a database focused in head movements of grammatical facial expressions of sentences in Brazilian Sign Language (LIBRAS). Keywords – Automatic Sign Language Recognition, Non-Manual Expressions, Brazilian Sign Language. 1. Introduction Human languages are systems of communication used to express, to manipulate ideas and to create social bonds. There are not only vocal languages but also the so called sign languages (SLs) which are visual-spatial linguistic systems structured on ges- tures, that are used around the world by people with hearing impairments to communicate with members of their group and others. Despite sign language capabilities, there is still a strong barrier between deaf and hearing people. This language barrier arises because the Deaf usu- ally do not master spoken and written language and only a few hearing people can communicate using sign language [18]. Aiming to improve these com- munication, research efforts have been conducted in Automatic Sign Language Recognition (ASLR). The idea is that ASLR will translate sign language into text or sound, enabling the interaction between deaf and hearing people. In addition, ASLR can be applied as an assistive writing tool for young learn- ers that were born deaf and typically face great diffi- culties during the acquisition of reading and writing skills [11]. One of the challenges involved in the develop- ment of ASLR technology is that sign languages, in the same way that spoken languages, emerged spontaneously, evolved naturally and reflected the worldwide sociocultural differences, giving origin to a wide range of variations such as the British Sign Language (BSL), the American Sign Lan- guage (ASL), the Japanese Sign Language (JSL), the Brazilian Sign Language (LIBRAS) and many others. However, analogously to spoken languages, in which is possible to combine phonemes to form words, it is possible identify a set of parameters that are combined simultaneously to create signs [13]. Such parameters can be divided into manual sig- nals and non-manual signals. Manual signals (MS) correspond to parameters such as the hand-shape of the sign, the location of the sign execution in front body region and the path and speed of the hands while the sign is made. In contrast, non-manual signals (NMS) correspond to head and shoulders movements, facial features such as eye brown mo- tion and lip-mouth movements. During the last decade, many efforts have been made to explore the automatic recognition of MS gestures [5, 16, 18]. However, the non-manual sig- nals have great impact in sign appearance and they are an essential component of this type of communi- cation. For this reason, recent approaches have been also exploring the NMS recognition [2, 3] and the mechanisms to combine them with MS recognition [14, 17]. However, there is still a lack of databases in this area. To our knowledge, there are only a few databases with non-manual expressions in LIBRAS and they do not display video images only the points of the face relevant to the studies conducted [8]. The objective of the present work is to introduce a database, with video images from interpreters and deaf people performing examples of NMS. Our ob- ject of study is the Brazilian Sign Language (LI-

Upload: truongdieu

Post on 09-Nov-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP … · Emely Pujólli da Silva , Paula Dornhofer Paro Costa Departamento de Engenharia de Computação e Automação Industrial

X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP (EADCA)X DCA/FEEC/University of Campinas (UNICAMP) Workshop (EADCA)

Campinas, 26 e 27 de outubro de 2017Campinas, Brazil, October 26-27, 2017

QLIBRAS: A novel database for grammatical facial expressions inBrazilian Sign Language

Emely Pujólli da Silva , Paula Dornhofer Paro Costa

Departamento de Engenharia de Computação e Automação Industrial (DCA)Faculdade de Engenharia Elétrica e de Computação (FEEC)

Universidade Estadual de Campinas (Unicamp)Caixa Postal 6101, 13083-970 – Campinas, SP, Brasil

[email protected], [email protected]

Abstract – Individuals with some degree of hearing impairment typically face difficulties in communicating withhearing individuals and during the acquisition of reading and writing skills. Sign language (SL) is a language struc-tured in gestures that, as any other human language, present variations around the world and that is widely adoptedby the deaf. Automatic Sign Language Recognition (ASLR) technology aims to translate sign language gesturesinto written or spoken sentences of a target language with the goal of improving the communication between deafand hearing individuals. An important step towards the improvement of ASLR models is the access to comprehen-sive databases of non-manual signs. This paper presents our first approach to build a a database focused in headmovements of grammatical facial expressions of sentences in Brazilian Sign Language (LIBRAS).

Keywords – Automatic Sign Language Recognition, Non-Manual Expressions, Brazilian Sign Language.

1. IntroductionHuman languages are systems of communicationused to express, to manipulate ideas and to createsocial bonds. There are not only vocal languages butalso the so called sign languages (SLs) which arevisual-spatial linguistic systems structured on ges-tures, that are used around the world by people withhearing impairments to communicate with membersof their group and others.

Despite sign language capabilities, there is still astrong barrier between deaf and hearing people.This language barrier arises because the Deaf usu-ally do not master spoken and written language andonly a few hearing people can communicate usingsign language [18]. Aiming to improve these com-munication, research efforts have been conductedin Automatic Sign Language Recognition (ASLR).The idea is that ASLR will translate sign languageinto text or sound, enabling the interaction betweendeaf and hearing people. In addition, ASLR can beapplied as an assistive writing tool for young learn-ers that were born deaf and typically face great diffi-culties during the acquisition of reading and writingskills [11].

One of the challenges involved in the develop-ment of ASLR technology is that sign languages,in the same way that spoken languages, emergedspontaneously, evolved naturally and reflected theworldwide sociocultural differences, giving originto a wide range of variations such as the BritishSign Language (BSL), the American Sign Lan-guage (ASL), the Japanese Sign Language (JSL),

the Brazilian Sign Language (LIBRAS) and manyothers.

However, analogously to spoken languages, inwhich is possible to combine phonemes to formwords, it is possible identify a set of parameters thatare combined simultaneously to create signs [13].Such parameters can be divided into manual sig-nals and non-manual signals. Manual signals (MS)correspond to parameters such as the hand-shape ofthe sign, the location of the sign execution in frontbody region and the path and speed of the handswhile the sign is made. In contrast, non-manualsignals (NMS) correspond to head and shouldersmovements, facial features such as eye brown mo-tion and lip-mouth movements.

During the last decade, many efforts have beenmade to explore the automatic recognition of MSgestures [5, 16, 18]. However, the non-manual sig-nals have great impact in sign appearance and theyare an essential component of this type of communi-cation. For this reason, recent approaches have beenalso exploring the NMS recognition [2, 3] and themechanisms to combine them with MS recognition[14, 17]. However, there is still a lack of databasesin this area. To our knowledge, there are only a fewdatabases with non-manual expressions in LIBRASand they do not display video images only the pointsof the face relevant to the studies conducted [8].

The objective of the present work is to introduce adatabase, with video images from interpreters anddeaf people performing examples of NMS. Our ob-ject of study is the Brazilian Sign Language (LI-

Page 2: X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP … · Emely Pujólli da Silva , Paula Dornhofer Paro Costa Departamento de Engenharia de Computação e Automação Industrial

X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP (EADCA)X DCA/FEEC/University of Campinas (UNICAMP) Workshop (EADCA)

Campinas, 26 e 27 de outubro de 2017Campinas, Brazil, October 26-27, 2017

Figure 1. Examples of facial expressions in LIBRAS. Images (A) and (B) are examples of GrammaticalFacial Expressions for Sentences. Image (C) is an example of Grammatical Facial Expression of In-tensity and image (D) is an example of Grammatical Facial Expression of Distinction. All the imagesdisplayed above are from TAS project [7].

BRAS). The present project is part of the TAS1

initiative, a multidisciplinary research group com-posed of deaf individuals, sign interpreters, lin-guists, engineers and computer scientists, that aimsto advance the development of assistive technolo-gies for the deaf.

2. Facial Expressions in LIBRAS

In Brazil, LIBRAS has the status of second officiallanguage since 2002 and its grammar has been sub-ject of intensive study during the last decade. Fa-cial expressions are fundamental and may changewidely in LIBRAS (see Figure 1). As in any signlanguage, LIBRAS does not count with the acous-tic support of voice intonation when discourse isbuilt. Due to that, facial expressions are grantednew functions, presenting themselves as an impor-tant element to assist as semantical functions mark-ers. Also, a unique facial expression may be com-bined to many different manual signs and the mean-ing of the same manual sign can vary significantlydepending on the facial expression that accompa-nies it.

In LIBRAS, facial expressions that convey an ideaof feeling and emotion, are called Affective FacialExpressions (AFE). Affective facial expressions canstart before a specific sign and end after the sentencehas been completed. In other words, AFEs modu-late the whole sentence, modifying the full meaningof a sequence of signs. AFEs are adopted, for ex-ample, when the signer communicates ideas sarcas-tically or when he/she is describing a sad event. A

1Brazilian Portuguese acronym for Tecnologias Assistivaspara Surdos (Assistive Technologies for the Deaf). For addi-tional references: http://www.tas.fee.unicamp.br

visual characteristic of AFEs is that they employ anintegrated set of facial muscles.

Grammatical facial expressions in Libras are ex-pressions that typically occur at specific points ofa sentence or they are associated to a specific signexecution [8, 9]. Observing the different propertiesof grammatical facial expressions we can categorizethem into Grammatical Facial Expression for Sen-tence (GES), Grammatical Facial Expressions of In-tensity (GEI) and Grammatical Facial Expressionsof Distinction (GED).

Grammatical facial expression for sentence definesthe type of sentence that is being signed [15].Accordingly with the structure and informationof the sentence, it can be classified into: WH-question (WH), Yes/No question (YN), Doubt ques-tion (DQ), Topic (T), Negation (N), Assertion (A),Conditional clause (CC), Focus (F) and Relativeclause (RC). In LIBRAS, there are GES markersthat are expressed by the face and head movements.

In addition, Grammatical Facial Expressions of In-tensity differentiate the meaning of the sign assum-ing the role of quantifier. For example, the samesign associated to the word “beautiful", can haveits meaning attenuated to “cute" or “very beautiful",depending on the signer’s facial expression.

Finally, without its characteristic Grammatical Fa-cial Expression of Distinction a sign is incompleteand cannot be distinguished from other signs withthe same manual signal. In other words, GFDs helpsto define the meaning of a sign. For a summariza-tion of non-manual signs and their semantic func-tions we invite the reader to consult [15].

Page 3: X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP … · Emely Pujólli da Silva , Paula Dornhofer Paro Costa Departamento de Engenharia de Computação e Automação Industrial

X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP (EADCA)X DCA/FEEC/University of Campinas (UNICAMP) Workshop (EADCA)

Campinas, 26 e 27 de outubro de 2017Campinas, Brazil, October 26-27, 2017

2.1. Grammatical Facial Expressions forSentences

In LIBRAS, GES are defined largely by head move-ments [6, 15]. So in our pursuit for a model capa-ble of recognition of facial expressions in BrazilianSign Language, it become indispensable the discus-sion of a method for recognition of head motion. Inaddition, these problem can be described as a studyof pose estimation in computer vision and robotics.Despite being an interesting problem these is quitecomplex. However, there are not many database or adefinitive solution to the problem of head movementclassification. Thus, this was one of the challengeswe encountered in our research and our solution wasto construct a database of head movements in LI-BRAS so that we can carefully study the movementover time and during the performance of a sentence.

3. Proposed QLIBRAS DatabaseDespite the small availability of internet videos inBrazilian Sign Language, we were able to composea database using parts of videos with interpretersand deaf persons performing questions in LIBRAS(see Figure 2). In addition to editing and labelingthe videos, we also built a dataset matrix composedby some facial points tracked on the videos usingDlib [12] and Optical Flow [10].

The QLIBRAS database is compose by 50 videosbeing 20 videos with WH-questions, 20 videos withYes/No question and 10 videos with assertion sen-tences. We label the videos accordingly with thesesemantic functions, because their head movementis distinctive from the others what makes a goodcharacteristic for classification. Also, in each videothere are a different question being performed in LI-BRAS.

Our main goal in creating this database is, in thefuture, the construction and training of a machinelearning model. With that in mind, we decided tochoose videos in which the person appears at firstfacing the camera, in a position that makes moreeasy to capture and track the points of the face byany algorithm. In the set of videos there are the par-ticipation of two women and five men.

To improve the limited selection of videos in LI-BRAS, we also use videos in which an interpreteris placed on top of the image. That is the case whena video is being translated to LIBRAS. When theinterpreter is inserted over the video, it is placed in

the bottom with the size smaller than half of the sizeof the image. During our editing it was necessaryto resize and crop some of the image what madethe videos varying in length and resolution. Mostof the videos where obtained online2 under CreativeCommons license. This means that under a CreativeCommons license the creator own the work, but thecommunity can reuse and edit that video [1].Theother part of the videos were filmed by a deaf TASproject associate [7].

4. Conclusions and Future WorkFacial expressions have an unique part in Sign Lan-guage so the research around Non Manual signs isextremely important. For a recognition method ofgrammatical facial expressions in LIBRAS, it is in-dispensable a good database. The QLIBRAS is anew database composed by video images and is theprincipal result of this work.

Our next step, is to evaluate and apply the datasetfrom QLIBRAS into machine learning algorithms.Hopefully, in the future we will use this data in theconstruction of a model for automatic recognition offacial expressions in LIBRAS.

Acknowledgement

The research for this paper was financially sup-ported by the National Council for the Improvementof Higher Education (CAPES). We also would liketo thank the associate Luciana Aguera Rosa fromTAS project for participating on the videos.

References[1] Creative commons license. http://www.

creativecommons.org.br/, 2008.[2] Epameinondas Antonakos, Anastasios Rous-

sos, and Stefanos Zafeiriou. A survey onmouth modeling and analysis for sign lan-guage recognition. In Automatic Face andGesture Recognition (FG), 2015 11th IEEEInternational Conference and Workshops on,volume 1, pages 1–7. IEEE, 2015.

[3] Malladihalli S Bhuvan, Vinay Rao, SiddharthJain, TS Ashwin, Ram MR Guddetti, andS Pramod Kulgod. Detection and analysismodel for grammatical facial expressions in

2The videos for the dataset where obtained in the followingyoutube channels [4]: Kitana Dreams, Filmesquevoam, Ger-mano Dutra Jr, Andrei Borges and Grupo Feneis.

Page 4: X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP … · Emely Pujólli da Silva , Paula Dornhofer Paro Costa Departamento de Engenharia de Computação e Automação Industrial

X Encontro de Alunos e Docentes do DCA/FEEC/UNICAMP (EADCA)X DCA/FEEC/University of Campinas (UNICAMP) Workshop (EADCA)

Campinas, 26 e 27 de outubro de 2017Campinas, Brazil, October 26-27, 2017

Figure 2. Examples of QLIBRAS database videos. Images extracted from TALES project [7] and youtubechannels [4]: Kitana Dreams, Filmesquevoam, Andrei Borges.

sign language. In Region 10 Symposium (TEN-SYMP), 2016 IEEE, pages 155–160. IEEE,2016.

[4] Steve Chen Chad Hurley and Jawed Karim.Youtube. official youtube website. google inc.https://youtube.com, 2017.

[5] Helen Cooper, Brian Holt, and Richard Bow-den. Sign Language Recognition, page539–562. Springer London, London, 2011.

[6] Ronice M de Quadros and Lodenir B Karnopp.Língua de sinais brasileira: estudos lingüísti-cos. Artmed Editora, 2009.

[7] Francisco Aulísio dos Santos Paiva,José Mario De Martino, Plínio AlmeidaBarbosa, Ângelo Benetti, and Ivani RodriguesSilva. Um sistema de transcrição para línguade sinais brasileira: O caso de um avatar.Revista do GEL, 13(3):12–48, 2016.

[8] Fernando A Freitas, Sarajane M Peres,Clodoaldo A de Moraes Lima, and Felipe VBarbosa. Grammatical facial expressionsrecognition with machine learning. In FLAIRSConference, 2014.

[9] Fernando de Almeida Freitas. Reconheci-mento automático de expressões faciais gra-maticais na língua brasileira de sinais. Mas-ter’s thesis, Universidade de São Paulo (USP),2011.

[10] Itseez. Open source computer vision li-brary. https://github.com/itseez/opencv, 2015.

[11] Richard Kennaway, John RW Glauert, andInge Zwitserlood. Providing signed content onthe internet by synthesized animation. ACMTransactions on Computer-Human Interaction(TOCHI), 14(3):15, 2007.

[12] Davis E. King. Dlib-ml: A machine learn-ing toolkit. Journal of Machine Learning Re-search, 10:1755–1758, 2009.

[13] Scott K Liddell and Robert E Johnson. Amer-ican sign language: The phonological base.Sign language studies, 64(1):195–277, 1989.

[14] Luis Quesada, Gabriela Marín, and Luis AGuerrero. Sign language recognition modelcombining non-manual markers and hand-shapes. In Ubiquitous Computing and Am-bient Intelligence: 10th International Confer-ence, UCAmI 2016, San Bartolomé de Tira-jana, Gran Canaria, Spain, November 29–December 2, 2016, Proceedings, Part I 10,pages 400–405. Springer, 2016.

[15] Emely P. Silva and Paula D. P. Costa. Recog-nition of non-manual expressions in braziliansign language. In 12th IEEE InternationalConference on Automatic Face and GestureRecognition. IEEE, 2017.

[16] Thad Starner, Joshua Weaver, and Alex Pent-land. Real-time american sign languagerecognition using desk and wearable com-puter based video. IEEE Transactions onPattern Analysis and Machine Intelligence,20(12):1371–1375, 1998.

[17] Hee-Deok Yang and Seong-Whan Lee. Ro-bust sign language recognition by combin-ing manual and non-manual features basedon conditional random field and support vec-tor machine. Pattern Recognition Letters,34(16):2051–2056, 2013.

[18] José Elías Yauri Vidalón and José MarioDe Martino. Brazilian Sign Language Recog-nition Using Kinect, pages 391–402. SpringerInternational Publishing, Cham, 2016.