sign language technology and the promise it holds for corpus
TRANSCRIPT
![Page 1: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/1.jpg)
IInstitute for nstitute for LLanguage and anguage and SSpeech peech PProcessing (ILSP) rocessing (ILSP) / / ATHENA Research and Innovation Centre
www.ilsp.grwww.ilsp.gr
Sign language technology and the promise it holds for corpus linguistics
Eleni [email protected]
Assistive Technology GroupSign Language Technologies Team (SLT)Sign Language Technologies Team (SLT)
www.ilsp.gr/assistive.htmlwww.ilsp.gr/assistive.html
![Page 2: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/2.jpg)
During the 1st SLCN Workshop we dealt with the question:
Is every set of language data a
corpus?
![Page 3: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/3.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Language Corpus Definition
““A corpus is a collection of pieces of language A corpus is a collection of pieces of language that are selected and ordered according to that are selected and ordered according to explicit linguistic criteria in order to be used as explicit linguistic criteria in order to be used as a sample of the languagea sample of the language””
Source: Source: SinclairSinclair,, http://http://www.ilc.cnr.itwww.ilc.cnr.it/EAGLES)/EAGLES)
![Page 4: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/4.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
TThe definition of computer corpus in the samehe definition of computer corpus in the samedocument document proves proves crucial crucial for our discussionfor our discussion: :
““ A computer corpus is a corpus which is A computer corpus is a corpus which is encoded in a standardised and homogenous encoded in a standardised and homogenous way for openway for open--ended retrieval tasksended retrieval tasks…… ””..
![Page 5: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/5.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Corpus classification by Atkins et al. (1991):““a a corpus is a corpus is a well defined subset well defined subset [of language] [of language] that is that is
designed following specific requirements to serve designed following specific requirements to serve specific purposesspecific purposes””, ,
most crucially the demand for most crucially the demand for
knowledge managementknowledge management either in the form of either in the form of information retrievalinformation retrieval or in the form of or in the form of automatic categorisation and text dispatchingautomatic categorisation and text dispatchingaccording to thematic category.according to thematic category.
![Page 6: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/6.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Three types of corpora :
nn Text corpora Text corpora nn Speech corporaSpeech corporann Sign Language corporaSign Language corpora
![Page 7: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/7.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Natural Language Corpora
nn Should Should contain all possible instances contain all possible instances (vocabulary and (vocabulary and grammar phenomena) of a language required for the grammar phenomena) of a language required for the fulfillment of the corpus design purposes.fulfillment of the corpus design purposes.
nn ParticularParticular issuesissues: : -- adequateadequate coveragecoverage, , -- adequateadequate data quantitiesdata quantities-- efficient data classificationefficient data classification-- iterativeiterative evaluationevaluation techniquestechniques
![Page 8: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/8.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Why we need all this?
Because we develop corpora in order to exploit them in:
- theoretical linguistics research
- human language technologies (HLTs) and tools
![Page 9: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/9.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Text corpora of oral languages
nn Entail many millions of wordsEntail many millions of wordsnn Are available in the webAre available in the webnn Have NLP tools especially developed to Have NLP tools especially developed to
make them exploitablemake them exploitablenn Provide input to various NLP Provide input to various NLP
applicationsapplications
![Page 10: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/10.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Text corpora are:
nn TaggedTaggednn LemmatisedLemmatisednn IndexedIndexednn Assigned metadataAssigned metadata
(although metadata remain an open (although metadata remain an open issue: Metaissue: Meta--Net)Net)
nn SEARCHABLESEARCHABLE
![Page 11: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/11.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
![Page 12: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/12.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
![Page 13: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/13.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
![Page 14: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/14.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
search in Sign Language Corpora is restricted to metadata ONLY
nn StorageStorage Medium Medium nn ((DetailedDetailed)) GenreGenre of narrationof narrationnn ((DetailedDetailed)) Topic Topic nn Signer personal dataSigner personal datann DateDate of capturingof capturingnn Other corpus external infoOther corpus external info
![Page 15: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/15.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
=> Information is restricted to:
nn Elements peripheral to the linguistic Elements peripheral to the linguistic contentcontent
nn No cues about the use of the languageNo cues about the use of the languagenn No statistics possible as to frequency of No statistics possible as to frequency of
(co)occurrences in signing utterance(co)occurrences in signing utterancenn No concordances No concordances nn No No morphomorpho--phonological variationsphonological variations
![Page 16: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/16.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
![Page 17: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/17.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Oral language corpora in NLP domain
nn Statistical (linguistic knowledge blind) Statistical (linguistic knowledge blind) processingprocessing
nn Tag labelingTag labelingnn Linguistic knowledge rich processingLinguistic knowledge rich processingnn Hybrid approachesHybrid approaches
What about SL corpora in NLP domain?
![Page 18: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/18.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Sign language technologies open the way to true exploitation of SL corpora
nn Manual annotation supported by automatic Manual annotation supported by automatic annotation toolsannotation tools
nn Automatic segmentation of sign/phrase Automatic segmentation of sign/phrase boundariesboundaries
nn LabelingLabelingnn Tagging Tagging
based on Sign Recognition technologiesbased on Sign Recognition technologies
![Page 19: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/19.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
nn More close to natural Sign SynthesisMore close to natural Sign Synthesisnn More accurate formal representation of More accurate formal representation of SLsSLs
because of the increase of the volume of available because of the increase of the volume of available datadata
viavia
nn Search directly in the content of SL Search directly in the content of SL videovideo
nn Retrieval of linguistic informationRetrieval of linguistic information
![Page 20: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/20.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Tag/label based chunking exploiting i.e.
nn Detention (D) detector Detention (D) detector nn Posture (P) detector Posture (P) detector nn Transition (T) trajectory shape detector Transition (T) trajectory shape detector nn Transition Transition balisticbalistic trajectory detector trajectory detector nn Limb orientation detector Limb orientation detector
nn Detection of elbow locationsDetection of elbow locations
![Page 21: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/21.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
nn Detection of hand locations Detection of hand locations nn Contact detector Contact detector nn Symmetry detectorSymmetry detectornn Head Pose orientation detector Head Pose orientation detector nn Eyebrow detector Eyebrow detector nn Shoulder detector Shoulder detector nn Segmentation Segmentation nn Signer recognition Signer recognition nn Movement features Movement features nn Face Tracker Face Tracker
nn Signing Space calibrationSigning Space calibration
![Page 22: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/22.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Iterative algorithm for corpus creation valid also in the case of SL corpora
Initial set of sentences
Coverage calculation
Expanded Corpus
Coverage attained?
Target diphone coverage
Add new sentences
Complete Corpus
NO
YES
Manual Fine-Tuning
![Page 23: Sign language technology and the promise it holds for corpus](https://reader031.vdocument.in/reader031/viewer/2022020705/61fb77b52e268c58cd5e8440/html5/thumbnails/23.jpg)
4th SLCN Workshop, 3-4 December 2010, Berlin
Thank you for your attention!