session : p01 - corpora and chair: marko tadić annotation ...1048 bistra andreeva, william barry...

Sessions: 11.35 - 13.15 Area 1

Session : P01 - Corpora and

Annotation

Chair: Marko Tadić

59 AiTi Aw, Sharifah Mahani Aljunied, Nattadaporn Lertcheva

and Sasiwimon Kalunsima

TaLAPi – A Thai Linguistically Annotated Corpus for Language

Processing

120 Guiyao Ke, Pierre-Francois Marteau and Gildas Menier Variations on quantitative comparability measures and their

evaluations on synthetic French-English comparable corpora

147 Paul Felt, Eric Ringger, Kevin Seppi and Kristian Heal Using Transfer Learning to Assist Exploratory Corpus

Annotation

187 Miguel B. Almeida, Mariana S. C. Almeida, André F. T.

Martins, Helena Figueira, Pedro Mendes and Cláudia Pinto

Priberam Compressive Summarization Corpus: A New Multi-

Document Summarization Corpus for European Portuguese

253 Patrick Schone, Heath Nielson and Mark Ward Corpus and Evaluation of Handwriting Recognition of

Historical Genealogical Records

294 Milena Hnátková, Michal Křen, Pavel Procházka and Hana

Skoumalová

The SYN-series corpora of written Czech

300 Karel Kučera and Martin Stluka Corpus of 19th-century Czech Texts: Problems and Solutions

308 Maik Stührenberg Extending standoff annotation

345 Stefan Höfler and Kyoko Sugisaki Constructing and exploiting an automatically annotated

resource of legislative texts

Session : P02 -

Crowdsourcing

Chair: Alain Couillault

25 Yuan Luo, Thomas Boucher, Tolga Oral, David Osofsky and

Sara Weber

A Study on Expert Sourcing Enterprise Question Collection

and Classification

28 Balamurali A.R Can the Crowd be Controlled?: A Case Study on Crowd

Sourcing and Automatic Validation of Completed Tasks based

on User Modeling

94 Mitesh M. Khapra, Ananthakrishnan Ramanathan, Anoop

Kunchukuttan, Karthik Visweswariah and Pushpak

Bhattacharyya

When Transliteration Met Crowdsourcing : An Empirical

Study of Transliteration via Crowdsourcing using Efficient,

Non-redundant and Fair Quality Control

132 Manjira Sinha, Tirthankar Dasgupta and Anupam Basu Design and Development of an Online Computational

Framework to Facilitate Language Comprehension Research

on Indian Languages

319 Martin Benjamin Collaboration in the Production of a Massively Multilingual

Lexicon

363 Marco Marelli, Stefano Menini, Marco Baroni, Luisa

Bentivogli, Raffaella Bernardi and Roberto Zamparelli

A SICK cure for the evaluation of compositional distributional

semantic models

431 Wajdi Zaghouani and Kais Dukes Can Crowdsourcing be used for Effective Annotation of

Arabic?

471 Héctor Martínez Alonso and Lauren Romeo Crowdsourcing as a preprocessing for complex semantic

annotation tasks

564 Christoph Draxler Online experiments with the Percy software framework -

experiences and some early results

641 Ryan Cotterell and Chris Callison-Burch A Multi-Dialect, Multi-Genre Corpus of Informal Written

Arabic

Session : P03 - Dialogue Chair: Dan Cristea

113 Stefan Ultes, Hüseyin Dikme and Wolfgang Minker First Insight into Quality-Adaptive Dialogue

169 Volha Petukhova, Martin Gropp, Dietrich Klakow, Gregor

Eigner, Mario Topf, Stefan Srb, Petr Motlicek, Blaise Potard,

John Dines, Olivier Deroo, Ronny Egeler, Uwe Meinz, Steffen

Liersch and Anna Schmidt

The DBOX Corpus Collection of Spoken Human-Human and

Human-Machine Dialogues

321 Dietmar Rösner, Rafael Friesen, Stephan Günther and Rico

Andrich

Modeling and evaluating dialog success in the LAST MINUTE

corpus

575 Layla El Asri, Rémi Lemonnier, Romain Laroche, Olivier

Pietquin and Hatim Khouzaimi

NASTIA: Negotiating Appointment Setting Interface

576 Layla El Asri, Romain Laroche and Olivier Pietquin DINASTI: Dialogues with a Negotiating Appointment Setting

Interface

959 Thomas Pellegrini, Vahid Hedayati and Angela Costa El-WOZ: a client-server wizard-of-oz interface

Session : P04 - Phonetic

Databases and Prosody

Chair: Philippe Martin

119 Claire Brierley, Majdi Sawalha and Eric Atwell Tools for Arabic Natural Language Processing: a case study in

qalqalah prosody

299 Johann-Mattis List and Jelena Prokić A Benchmark Database of Phonetic Alignments in Historical

Linguistics and Dialectology

381 Anne Lacheret, Sylvain Kahane, Julie Beliao, Anne Dister, Kim

Gerdes, Jean-Philippe Goldman, Nicolas Obin, Paola

Pietrandrea and Atanas Tchobanov

Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French

870 Jean-Philippe Goldman, Tea Prsir and Antoine Auchlin C-PhonoGenre: a 7-hours corpus of 7 speaking styles in

French: relations between situational features and prosodic

properties

454 Abir Masmoudi, Mariem Ellouze Khmekhem, Yannick Esteve,

Lamia Hadrich Belguith and Nizar Habash

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech

Recognition

716 Yuichi Ishimoto, Tomoyuki Tsuchiya, Hanae Koiso and

Yasuharu Den

Towards Automatic Transformation between Different

Transcription Conventions: Prediction of Intonation Markers

from Linguistic and Acoustic Features

727 Tiberiu Boroș, Adriana Stan, Oliver Watts and Stefan Daniel

Dumitrescu

RSS-TOBI - A Prosodically Enhanced Romanian Speech Corpus

931 Klim Peshkov and Laurent Prévot Segmentation evaluation metrics, a comparison grounded on

prosodic and discourse units

DAY1 Poster Sessions

1048 Bistra Andreeva, William Barry and Jacques Koreman A Cross-language Corpus for Studying the Phonetics and

Phonology of Prominence

1200 Liviu Dinu, Alina Maria Ciobanu, Ioana Chitoran and Vlad

Niculae

Using a machine learning model to assess the complexity of

stress systems

1212 Tanja Schultz and Tim Schlippe GlobalPhone: Pronunciation Dictionaries in 20 Languages

Session : P05 - Speech

Resources

Chair: Martine Adda-

Decker TBC

7 Juan Rafael Orozco-Arroyave, Julián David Arias-Londoño,

Jesús Francisco Vargas-Bonilla, María Claudia Gonzalez-

Rátiva and Elmar Nöth

New Spanish speech corpus database for the analysis of

people suffering from Parkinson's disease

32 François Salmon and Félicien Vallet An Effortless Way To Create Large-Scale Datasets For Famous

Speakers

41 Florian Schiel and Thomas Kisler German Alcohol Language Corpus - the Question of Dialect

89 Jetske Klatter, Roeland Van Hout, Henk van den Heuvel,

Paula Fikkert, Anne Baker, Jan De Jong, Frank Wijnen, Eric

Sanders and Paul Trilsbeek

Vulnerability in Acquisition, Language Impairments in Dutch:

Creating a VALID Data Archive

134 Mirjam Ernestus, Lucie Kočková-Amortová and Petr Pollak The Nijmegen Corpus of Casual Czech

182 Carlos Daniel Hernandez Mena and Abel Herrera Camacho CIEMPIESS: A New Open-Sourced Mexican Spanish Radio

Corpus

252 Marie Kopřivová, Hana Goláňová, Petra Klimešová and David

Lukeš

Mapping Diatopic and Diachronic Variation in Spoken Czech:

the Ortofon and Dialekt Corpora

290 Thomas Schmidt The Research and Teaching Corpus of Spoken German – FOLK

312 Niklas Vanhainen and Giampiero Salvi Free Acoustic and Language Models for Large Vocabulary

Continuous Speech Recognition in Swedish


Session : P06 - Endangered

Languages

Chair: Laurette Pretorius

TBC

143 Kristiina Jokinen Open-domain Interaction and Online Content in the Sami

Language

438 Tjerk Hagemeijer, Michel Généreux, Iris Hendrickx, Amália

Mendes, Abigail Tiny and Armando Zamora

The Gulf of Guinea Creole Corpora

1046 Dagmar Jung, Katarzyna Klessa, Zsuzsa Duray, Beatrix Oszkó,

Mária Sipos, Sándor Szeverényi, Zsuzsa Várnai, Trilsbeek Paul

and Tamás Váradi

Languagesindanger.eu - including multimedia language

resources to disseminate knowledge and create educational

material on less‑resourced languages

1174 José Pedro Ferreira, Cristiano Chesi, Daan Baldewijns,

Fernando Miguel Pinto, Margarita Correia, Daniela Braga,

Hyongsil Cho, Amadeu Ferreira and Miguel Dias

Casa de la Lhéngua: a set of language resources and natural

language processing tools for Mirandese

1216 Christian Curtis A finite-state morphological analyzer for a Lakota HPSG

grammar

Session : P07 - Evaluation

Methodologies

Chair: Violeta Seretan

52 Adam Kilgarriff, Pavel Rychlý, Milos Jakubicek, Vojtěch Kovář,

Vit Baisa and Lucia Kocincová

Extrinsic Corpus Evaluation with a Collocation Dictionary Task

289 Nancy Underwood, Bartolomé Mesa-Lao, Mercedes García

Martínez, Michael Carl, Vicent Alabau, Jesús González-Rubio,

Luis A. Leiva, Germán Sanchis-Trilles, Daniel Ortíz-Martínez

and Francisco Casacuberta

Evaluating the effects of interactivity in a post-editing

workbench

320 Bogdan Ludusan, Maarten Versteegh, Aren Jansen,

Guillaume Gravier, Xuan-Nga Cao, Mark Johnson and

Emmanuel Dupoux

Bridging the gap between speech technology and natural

language processing: an evaluation toolbox for term

discovery systems

398 Paula Lopez-Otero, Laura Docio-Fernandez and Carmen

Garcia-Mateo

Introducing a Framework for the Evaluation of Music

Detection Tools

427 Bartosz Broda, Bartłomiej Nitoń, Włodzimierz Gruszczyński

and Maciej Ogrodniczuk

Measuring Readability of Polish Texts: Baseline Experiments

829 Jason Utt, Sylvia Springorum, Maximilian Köper and Sabine

Schulte im Walde

Fuzzy V-Measure - An Evaluation Method for Cluster

Analyses of Ambiguous Data

887 Andrea Horbach, Alexis Palmer and Magdalena Wolska Finding a Tradeoff between Accuracy and Rater's Workload

in Grading Clustered Short Answers

935 Petra Barancikova, Rudolf Rosa and Ales Tamchyna Improving Evaluation of English-Czech MT through

Paraphrasing

1198 Chi-kiu Lo and Dekai Wu On the reliability and inter-annotator agreement of human

semantic MT evaluation via HMEANT

Session : P08 - Language

Resource Infrastructures

Chair: Georg Rehm

206 Nelleke Oostdijk and Henk van den Heuvel The evolving infrastructure for language resources and the

role for data scientists

325 Dorte Haltrup Hansen, Lene Offersgaard and Sussi Olsen Using TEI, CMDI and ISOcat in CLARIN-DK

338 Jonathan Chevelu, Gwénolé Lecorvé and Damien Lolive ROOTS: a toolkit for easy, fast and consistent processing of

large sequential annotated data collections

368 Matteo Abrate, Angelo Mario Del Grosso, Emiliano

Giovannetti, Angelica Lo Duca, Damiana Luzzi, Lorenzo

Mancini, Andrea Marchetti, Irene Pedretti and Silvia Piccini

Sharing Cultural Heritage: the Clavius on the Web Project

517 Verena Lyding, Lionel Nicolas and Egon Stemle 'interHist' an interactive visual interface for corpus

exploration

Session : P09 - Machine

Translation

Chair: Jan Hajic TBC

21 Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Constructing a Chinese–Japanese Parallel Corpus from

Wikipedia

43 Lise Rebout and Phillippe Langlais An Iterative Approach for Mining Parallel Sentences in a

Comparable Corpus

103 Dan Tufiș Large SMT data-sets extracted from Wikipedia

107 Juan Luo and Yves Lepage Production of Phrase Tables in 11 European Languages using

an Improved Sub-sentential Aligner

162 Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda

and Satoshi Nakamura

Collection of a Simultaneous Translation Corpus for

Comparative Analysis

205 Sharid Loaiciga, Thomas Meyer and Andrei Popescu-Belis English-French Verb Phrase Alignment in Europarl for Tense

Translation Modeling

610 Bushra Jawaid and Ondrej Bojar Two-Step Machine Translation with Lattices

Session : P10 - Metadata Chair: Victoria Arranz

156 Matej Durco and Menzo Windhouwer The CMD Cloud

332 Fritz Kliche, Andre Blessing, Jonathan Sonntag and Ulrich Heid The e-Identity Exploration Workbench

1022 Damir Cavar and Malgorzata Cavar Visualization of Language Relations and Families: MultiTree

Session : P11 - MultiWord

Expressions and Terms

Chair: Valeria Quochi

263 Pierre André Ménard and Caroline Barriere Linked Open Data and Web Corpus Data for noun compound

bracketing

331 Anita Rácz, István Nagy T. and Veronika Vincze 4FX: Light Verb Constructions in a Multilingual Parallel Corpus

462 Wan Yu Ho, Christine Kng, Shan Wang and Francis Bond Identifying Idioms in Chinese Translations

466 Kara Warburton Narrowing the Gap between Termbases and Corpora in

Commercial Environments

518 Rodrigo Boos, Kassius Prestes and Aline Villavicencio Identification of Multiword Expressions in the brWaC

519 Lis Pereira, Elga Strafella and Yuji Matsumoto Collocation or Free Combination? – Applying Machine

Translation Techniques to identify collocations in Japanese

630 Irina Temnikova, Andrea Varga and Dogan Biyikli Building a Crisis Management Term Resource for Social

Media: The Case of Floods and Protests

Session : P12 - Treebanks Chair: Beatrice Daille

18 Riyaz Ahmad Bhat, Shahid Musjtaq Bhat and Dipti Misra

Sharma

Towards building a Kashmiri Treebank: Setting up the

Annotation Pipeline

42 Shinsuke Mori, Hideki Ogura and Tetsuro Sasada A Japanese Word Dependency Corpus

63 Chris Culy, Marco Passarotti and Ulla König-Cardanobile A Compact Interactive Visualization of Dependency Treebank

Query Results

70 Scott Martens and Marco Passarotti Thomas Aquinas in the TüNDRA: Integrating the Index

Thomisticus Treebank into CLARIN-D

225 Blanca Arias, Nuria Bel, Mercè Lorente, Montserrat

Marimón, Alba Milà, Jorge Vivaldi, Muntsa Padró, Marina

Fomicheva and Imanol Larrea

Boosting the creation of a treebank

382 Montserrat Marimon, Núria Bel, Beatriz Fisas, Blanca Arias,

Silvia Vázquez, Jorge Vivaldi, Carlos Morell and Mercè

Lorente

The IULA Spanish LSP Treebank

303 Per Erik Solberg, Arne Skjærholt, Lilja Øvrelid, Kristin Hagen

and Janne Bondi Johannessen

The Norwegian Dependency Treebank

378 Mojgan Seraji, Carina Jahani, Beáta Megyesi and Joakim Nivre A Persian Treebank with Stanford Typed Dependencies

441 Masood Ghayoomi and Jonas Kuhn Converting an HPSG-based Treebank into its Parallel

Dependency-based Treebank


Session : P13 - Discourse

Annotation,

Representation and

Processing

Chair: Ann Bies

77 Kasia Budzynska, Mathilde Janier, Chris Reed, Patrick Saint-

Dizier, Manfred Stede and Olena yakorska

A Model for Processing Illocutionary Structures and

Argumentation in Debates

579 Manfred Stede and Arne Neumann Potsdam Commentary Corpus 2.0: Annotation for Discourse

Research

79 Magdalena Rysova Verbs of Saying with a Textual Connecting Function in the

Prague Discourse Treebank

155 Ryu Iida and Takenobu Tokunaga Building a Corpus of Manually Revised Texts from Discourse

Perspective

270 Lanjun Zhou, Binyang Li, Zhongyu Wei and Kam-Fai Wong The CUHK Discourse TreeBank for Chinese: Annotating

Explicit Discourse Connectives for the Chinese TreeBank

280 Thomas Bögel, Jannik Strötgen and Michael Gertz Computational Narratology: Extracting Tense Clusters from

Narrative Texts

330 Susana Bautista and Horacio Saggion Can Numerical Expressions Be Simpler? Implementation and

Demostration of a Numerical Simplification System for

Spanish

400 Cristina Grisot and Thomas Meyer Cross-linguistic annotation of narrativity for English/French

verb tense disambiguation

Session : P14 - Grammar

and Syntax

Chair: Cristina Bosco

47 Richard Sproat, Bruno Cartoni, HyunJeong Choe, David

Huynh, Linne Ha, Ravindran Rajakumar and Evelyn Wenzel-

Grondie

A Database for Measuring Linguistic Information Content

50 Katerina Rysova and Jiří Mírovský Valency and Word Order in Czech – A Corpus Probe

211 Ludger Zeevaert Mörkum Njálu. An annotated corpus to analyse and explain

grammatical divergences between 14th-century manuscripts

of Njál's saga

346 Roman Schneider GenitivDB – a Corpus-Generated Database for German

Genitive Classification

361 Tibor Kiss, Francis Jeffry Pelletier and Tobias Stadtfeld Building a reference lexicon for countability in English

Session : P15 - Lexicons Chair: Amália Mendes

34 Ismail El Maarouf, Jane Bradbury, Vít Baisa and Patrick Hanks Disambiguating Verbs by Collocation: Corpus Lexicography

meets Natural Language Processing

58 Nabil Hathout, Franck Sajous and Basilio Calderone GLÀFF, a Large Versatile French Lexicon

102 John Richardson, Toshiaki Nakazawa and Sadao Kurohashi Bilingual Dictionary Construction with Transliteration Filtering

127 Krasimir Angelov Bootstrapping Open-Source English-Bulgarian Computational

Dictionary

128 Mathieu Mangeot MotàMot project: conversion of a French-Khmer published

dictionary for building a multilingual lexical system

154 Menzo Windhouwer, Justin Petro and Shakila Shayan RELISH LMF: Unlocking the Full Power of the Lexical Markup

Framework

175 Liviu Dinu and Alina Maria Ciobanu Building a Dataset of Multilingual Cognates for the Romanian

Lexicon

222 Palmira Marrafa, Raquel Amaro and Sara Mendes LexTec – a rich language resource for technical domains in

Portuguese

Session : P16 - Morphology Chair: Benoît Sagot

2 Fadoua Ataa Allah and Siham Boulaknadel Amazigh Verb Conjugator

66 Menno van Zaanen, Gerhard Van Huyssteen, Suzanne

Aussems, Chris Emmery and Roald Eiselen

The Development of Dutch and Afrikaans Language

Resources for Compound Boundary Analysis

116 Rico Sennrich and Beat Kunz Zmorge: A German Morphological Lexicon Extracted from

Wiktionary

207 Attila Novák A New Form of Humor – Mapping Constraint-Based

Computational Morphologies to a Finite-State Representation

262 Veronika Vincze, Viktor Varga, Katalin Ilona Simkó, János

Zsibrita, Ágoston Nagy, Richárd Farkas and János Csirik

Szeged Corpus 2.5: Morphological Modifications in a

Manually POS-tagged Hungarian Corpus

437 Çağrı Çöltekin A set of open source tools for Turkish natural language

processing

501 Magda Sevcikova and Zdenek Zabokrtsky Word-Formation Network for Czech

593 Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed

El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery,

Owen Rambow and Ryan Roth

MADAMIRA: A Fast, Comprehensive Tool for Morphological

Analysis and Disambiguation of Arabic

607 Yvonne Adesam, Malin Ahlberg, Peter Andersson, Gerlof

Bouma, Markus Forsberg and Mans Hulden

Computer-aided morphology expansion for Old Swedish

768 Marcin Woliński Morfeusz Reloaded

Session : P17 - WordNet Chair: Francis Bond

121 Antoni Oliver and Salvador Climent Automatic creation of WordNets from parallel corpora

122 Spandana Gella, Carlo Strapparava and Vivi Nastase Mapping WordNet Domains, WordNet Topics and Wikipedia

Categories to Generate Multilingual Domain Specific

Resources

203 Quentin Pradet, Laurence Danlos and Gaël de Chalendar Adapting VerbNet to French using existing resources

541 Gianluca Lebani, Veronica Viola and Alessandro Lenci Bootstrapping an Italian VerbNet: data-driven analysis of

verb alternations

582 Ahti Lohk, Kaarel Allik, Heili Orav and Leo Võhandu Dense Components in the Structure of WordNet

1071 Yuri Bizzoni, Federico Boschetti, Harry Diakoff, Riccardo Del

Gratta, Monica Monachini and Gregory Crane

The Making of Ancient Greek WordNet

1083 Gerard de Melo Etymological Wordnet: Tracing The History of Words


Session: P18 - Corpora and

Annotation

Chair: Steve Cassidy TBC

199 Angela Costa, Tiago Luís and Luísa Coheur Translation errors from English to Portuguese: an annotated

corpus

360 Verginica Barbu Mititelu, Elena Irimia and Dan Tufiș CoRoLa – The Reference Corpus of Contemporary Romanian

Language

523 Houda Bouamor, Nizar Habash and Kemal Oflazer A Multidialectal Parallel Corpus of Arabic

558 Ahmed Salama, Houda Bouamor, Behrang Mohit and Kemal

Oflazer

YouDACC: the Youtube Dialectal Arabic Comment Corpus

529 Miquel Esplà-Gomis, Filip Klubička, Nikola Ljubešić, Sergio

Ortiz-Rojas, Vassilis Papavassiliou and Prokopis Prokopidis

Comparing two acquisition systems for automatically

building an English–Croatian parallel corpus from

multilingual websites

530 Siim Orasmaa Towards an Integration of Syntactic and Temporal

Annotations in Estonian

552 Louise Deleger, Anne-Laure Ligozat, Cyril Grouin, Pierre

Zweigenbaum and Aurelie Neveol

Annotation of specialized corpora using a comprehensive

entity and relation scheme

594 Ritesh Kumar Developing Politeness Annotated Corpus of Hindi Blogs

606 Adriane Boyd, Jirka Hana, Lionel Nicolas, Detmar Meurers,

Katrin Wisniewski, Andrea Abel, Karin Schöne, Barbora

Štindlová and Chiara Vettori

The MERLIN corpus: Learner language and the CEFR

612 Luz Rello, Ricardo Baeza-Yates and Joaquim Llisterri DysList: An Annotated Resource of Dyslexic Errors

624 Jena D. Hwang, Annie Zaenen and Martha Palmer Criteria for Identifying and Annotating Caused Motion

Constructions in Corpus Data

914 Ann Irvine, Joshua Langfus and Chris Callison-Burch The American Local News Corpus

Session : P19 - Document

Classification, Text

Categorisation

Chair: Damir Cavar

8 Mohamed Morchid, Richard Dufour and Georges Linares A LDA-based Topic Classification Approach from Highly

Imperfect Automatic Transcriptions

104 Juan Soler and Leo Wanner How to Use less Features and Reach Better Performance in

Author Gender Identification

195 Lucie Poláková, Pavlína Jínová and Jiří Mírovský Genres in the Prague Discourse Treebank

291 Stefania Degaetano-Ortlieb, Peter Fankhauser, Hannah

Kermes, Ekaterina Lapshinova-Koltunski, Noam Ordan and

Elke Teich

Data Mining with Shallow vs. Linguistic Features to Study

Diversification of Scientific Registers

402 Mahmoud El-Haj, Paul Rayson, Steve Young and Martin

Walker

Detecting Document Structure in a Very Large Corpus of UK

Financial Reports

470 Noushin Rezapour Asheghi, Serge Sharoff and Katja Markert Designing and Evaluating a Reliable Corpus of Web Genres

via Crowd-Sourcing

498 Ioannis Korkontzelos and Sophia Ananiadou Locating Requests among Open Source Software

Communication Messages

1007 Thamar Solorio, Ragib Hasan and Mainul Mizan Sockpuppet Detection in Wikipedia: A Corpus of Real-World

Deceptive Writing for Linking Identities

Session : P20 - FrameNet Chair: Alessandro Lenci

254 Ildikó Pilán and Elena Volodina Reusing Swedish FrameNet for training semantic roles

455 Marie-Claude L' Homme, Benoît Robichaud and Carlos

Subirats Rüggeberg

Discovering frames in specialized domains

496 Marie Candito, Pascal Amsili, Lucie Barque, Farah Benamara,

Gaël de Chalendar, Marianne Djemaa, Pauline Haas, Richard

Huyghe, Yvette Yannick Mathieu, Philippe Muller, Benoît

Sagot and Laure Vieu

Developing a French FrameNet: Methodology and First

results

Session : P21 - Semantics Chair: Peter Anick TBC

221 Reinhard Rapp Corpus-Based Computation of Reverse Associations

242 Haritz Salaberri, Olatz Arregi and Beñat Zapirain First approach toward Semantic Role Labeling for Basque

267 Tomoko Izumi, Tomohide Shibata, Hisako Asano, Yoshihiro

Matsuo and Sadao Kurohashi

Constructing a Corpus of Japanese Predicate Phrases for

Synonym/Antonym Relations

274 Martin Riedl, Richard Steuer and Chris Biemann Distributed Distributional Similarities of Google Books over

the Centuries

545 Kostadin Cholakov, Chris Biemann, Judith Eckle-Kohler and

Iryna Gurevych

Lexical Substitution Dataset for German

353 Nianwen Xue and Yuchen Zhang Buy one get one free: Distant annotation of Chinese tense,

event type and modality

403 Dan Stefanescu, Rajendra Banjade and Vasile Rus Latent Semantic Analysis Models on Wikipedia and TASA

461 Yuka Tateisi, Yo Shidahara, Yusuke Miyao and Akiko Aizawa Annotation of Computer Science Papers for Semantic

Relation Extrac-tion

574 Moritz Wittmann, Marion Weller and Sabine Schulte im

Walde

Automatic Extraction of Synonyms for German Particle Verbs

from Parallel Data with Distributional Similarity as a Re-

Ranking Feature

233 Gregor Titze, Volha Bryl, Cäcilia Zirn and Simone Paolo

Ponzetto

DBpedia Domains: augmenting DBpedia with domain

information

750 Elena Cabrio, Serena Villata and Fabien Gandon Classifying Inconsistencies in DBpedia Language Specific

Chapters


Resources

Chair: Giuseppe Riccardi

TBC

171 Thomas Schmidt The Database for Spoken German – DGD2

365 Annika Hämäläinen, Jairo Avelar, Silvia Rodrigues, Miguel

Sales Dias, Artur Kolesiński, Tibor Fegyó, Géza Németh, Petra

Csobánka, Karine Lan and David Hewson

The EASR Corpora of European Portuguese, French,

Hungarian and Polish Elderly Speech

394 Barbara Schuppler, Martin Hagmueller, Juan A. Morales-

Cordovilla and Hannes Pessentheiner

GRASS: the Graz corpus of Read And Spontaneous Speech

432 Hanae Koiso, Yasuharu Den, Ken'ya Nishikawa and Kikuo

Maekawa

Design and development of an RDB version of the Corpus of

Spontaneous Japanese

484 Camille Fauth, Anne Bonneau, Frank Zimmerer, Juergen

Trouvain, Bistra Andreeva, Vincent Colotte, Dominique Fohr,

Denis Jouvet, Jeanin Jügler, Yves Laprie, Odile Mella and

Bernd Möbius

Designing a Bilingual Speech Corpus for French and German

Language Learners: a Two-Step Process

511 Rosemary Orr, Marijn Huijbregts, Roeland van Beek, Lisa

Teunissen, Kate Backhouse and David van Leeuwen

Semi-automatic annotation of the UCU accents speech

corpus

514 Ana Lúcia Santos, Michel Généreux, Aida Cardoso, Celina

Agostinho and Silvana Abalada

A corpus of European Portuguese child and child-directed

speech

537 Anna Polychroniou, Hugues Salamin and Alessandro

Vinciarelli

The SSPNet-Mobile Corpus: Social Signal Processing Over

Mobile Phones

553 Katarzyna Klessa and Dafydd Gibbon Annotation Pro + TGA: automation of speech timing analysis

611 Björn Schuller, Felix Friedmann and Florian Eyben The Munich Biovoice Corpus: Effects of Physical Exercising,

Heart Rate, and Skin Conductance on Human Speech

Production


Session : P23 - Collaborative

Resource Construction

Chair: Christian Chiarcos

TBC

14 Włodzimierz Gruszczyński and Maciej Ogrodniczuk Digital Library 2.0: Source of Knowledge and Research

Collaboration Platform

95 Livio Robaldo, Guido Boella, Luigi Di Caro and Andrea Violato Exploiting networks in Law

151 Alex Rudnick, Taylor Skidmore, Alberto Samaniego and

Michael Gasser

Guampa: a Toolkit for Collaborative Translation

758 Billy T.M. Wong, Ian C. Chow, Jonathan J. Webster and

Hengbin Yan

The Halliday Centre Tagger: An Online Platform for Semi-

automatic Text Annotation and Analysis

769 Mauro Dragoni, Alessio Bosca, Matteo Casu and Andi Rexha Modeling, Managing, Exposing, and Linking Ontologies with a

Wiki-based Tool

817 Mathieu Lafourcade and Karën Fort Propa-L: a semantic filtering service from a lexical network

created using Games With A Purpose

940 Frederik Baumgardt, Giuseppe Celano, Gregory R. Crane,

Stella Dee, Maryam Foradi, Emily Franzini, Greta Franzini,

Monica Lent, Maria Moritz and Simona Stoyanova

Open Philology at the University of Leipzig

975 Joshua Elliot, Logan Kearsley, Jason Housley and Alan Melby LexTerm Manager: Design for an Integrated Lexicography

and Terminology System

1016 Jonathan Wright RESTful Annotation and Efficient Collaboration


Annotation

Chair: Maria Gavrilidou

1094 Zhiyi Song, Stephanie Strassel, Haejoong Lee, Kevin Walker,

Jonathan Wright, Jennifer Garland, Dana Fore, Brian Gainor,

Preston Cabe, Thomas Thomas, Brendan Callahan and Ann

Sawyer

Collecting Natural SMS and Chat Conversations in Multiple

Languages: The BOLT Phase 2 Corpus

656 Daniel Hladek, Jan Stas and Jozef Juhar The Slovak Categorized News Corpus

680 Matus Pleva and Jozef Juhar TUKE-BNews-SK: Slovak Broadcast News Corpus Construction

and Evaluation

675 Irina Temnikova, William A. Baumgartner Jr., Negacy D. Hailu,

Ivelina Nikolova, Tony McEnery, Adam Kilgarriff, Galia

Angelova and K. Bretonnel Cohen

Sublanguage Corpus Analysis Toolkit: A tool for assessing the

representativeness and sublanguage characteristics of

corpora

681 Csaba Oravecz, Tamás Váradi and Bálint Sass The Hungarian Gigaword Corpus

690 Željko Agić and Nikola Ljubešić The SETimes.HR Linguistically Annotated Corpus of Croatian

841 Nikola Ljubešić and Antonio Toral caWaC -- A web corpus of Catalan and its application to

language modeling and machine translation

691 Jerid Francom, Mans Hulden and Adam Ussishkin ACTIV-ES: a comparable, cross-dialect corpus of ‘everyday’

Spanish from Argentina, Mexico, and Spain

714 Vidas Daudaravicius Language Editing Dataset of Academic Texts

777 Suguru Matsuyoshi, Ryo Otsuki and Fumiyo Fukumoto Annotating the Focus of Negation in Japanese Text

1019 Siddharth Jain, Archna Bhatia, Angelique Rein and Eduard

Hovy

A Corpus of Participant Roles in Contentious Discussions


Translation

Chair: Hitoshi Isahara

TBC

210 Michael Carl, Mercedes Martínez García and Bartolomé

Mesa-Lao

CFT13: A resource for research into the post-editing process

384 Nianwen Xue, Ondrej Bojar, Jan Hajic, Martha Palmer,

Zdenka Uresova and Xiuhong Zhang

Not an Interlingua, But Close: Comparison of English AMRs to

Chinese and Czech

390 Miriam Kaeshammer and Anika Westburg On Complex Word Alignment Configurations

414 Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee,

Ritesh Shah and Pushpak Bhattacharyya

Shata-Anuvadak: Tackling Multiway Translation of Indian

Languages

473 Marco Turchi and Matteo Negri Automatic Annotation of Machine Translation Datasets with

Binary Quality Judgements

676 Violeta Seretan, Pierrette Bouillon and Johanna Gerlach A Large-Scale Evaluation of Pre-editing Strategies for

Improving User-Generated Content Translation

735 Nicolas Pécheux, Alexander Allauzen and François Yvon Rule-based Reordering Space in Statistical Machine

Translation

682 Kunal Sachdeva, Rishabh Srivastava, Sambhav Jain and Dipti

Sharma

Hindi to English Machine Translation: Using Effective

Selection in Multi-Model SMT

Session : P26 - Parallel

Corpora

Chair: Dan Tufiș

1137 Jayendra Rakesh Yeka, Prasanth Kolachina and Dipti Misra

Sharma

Benchmarking of English-Hindi parallel corpora

328 Petic Mircea and Daniela Gîfu Transliteration and alignment of parallel texts from Cyrillic to

Latin

674 Manuela Sanguinetti, Cristina Bosco and Loredana Cupi Exploiting catenae in a parallel treebank alignment

772 Yves Scherrer, Luka Nerima, Lorenza Russo, Maria Ivanova

and Eric Wehrli

SwissAdmin: A multilingual tagged parallel corpus of press

releases

774 Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma,

Francisco Oliveira and Lu Yi

UM-Corpus: A Large English-Chinese Parallel Corpus for

Statistical Machine Translation

807 Raphael Rubino, Antonio Toral, Nikola Ljubešić and Gema

Ramírez-Sánchez

Quality Estimation for Synthetic Parallel Data Generation

DAY2 Poster Sessions

846 Raivis Skadiņš, Jörg Tiedemann, Roberts Rozis and Daiga

Deksne

Billions of Parallel Words for Free: Building and Using the EU

Bookshop Corpus

877 Ahmed Abdelali, Francisco Guzman, Hassan Sajjad and

Stephan Vogel

The AMARA Corpus: Building parallel language resources for

the educational domain

1159 Ann Bies, Justin Mott, Seth Kulick, Jennifer Garland and Colin

Warner

Incorporating Alternate Translations into English Translation

Treebank

1199 Shikun Zhang, Wang Ling and Chris Dyer Dual Subtitles as Parallel Corpora

285 Pavel Vondřička Aligning parallel texts with InterText

Session : P27 - Sign

Language

Chair: Thomas Hanke

6 Rosalee Wolfe, John McDonald, Larwan Berke and Marie

Stumbo

Expanding n-gram analytics in ELAN and a case study for sign

synthesis

209 Matti Karppa, Ville Viitaniemi, Marcos Luzardo, Jorma

Laaksonen and Tommi Jantunen

SLMotion - An extensible sign language oriented video

analysis tool

440 Ville Viitaniemi, Tommi Jantunen, Leena Savolainen, Matti

Karppa and Jorma Laaksonen

S-pot - a benchmark in spotting signs within continuous

signing

278 Mayumi Bono, Kouhei Kikuchi, Paul Cibulka and Yutaka Osugi A Colloquial Corpus of Japanese Sign Language: Linguistic

Resources for Observing Sign Language Conversations

371 Leah Geer and Jonathan Keane Exploring factors that contribute to successful fingerspelling

comprehension

585 Jens Forster, Christoph Schmidt, Oscar Koller, Martin

Bellgardt and Hermann Ney

Extensions of the Sign Language Recognition and Translation

Corpus RWTH-PHOENIX-Weather

634 Julie Hochgesang The Use of a FileMaker Pro Database in Evaluating Sign

Language Notation Systems

1138 Mark Dilsizian, Polina Yanovich, Shu Wang, Carol Neidle and

Dimitris Metaxas

A New Framework for Sign Language Recognition based on

3D Handshape Identification and Linguistic Modeling


Session : P28 - Information

Extraction

Chair: Diana Maynard

3 Xavier Tannier Extracting News Web Page Creation Time with DCTFinder

190 Hans-Ulrich Krieger, Christian Spurk, Hans Uszkoreit, Feiyu

Xu, Yi Zhang, Frank Müller and Thomas Tolxdorff

Information Extraction from German Patient Records via

Hybrid Parsing and Relation Extraction Strategies

449 Júlia Pajzs, Ralf Steinberger, Maud Ehrmann, Mohamed

Ebrahim, Leonida Della Rocca, Stefano Bucci, Eszter Simon

and Tamás Váradi

Media monitoring and information extraction for the highly

inflected agglutinative language Hungarian

536 Antje Schlaf, Claudia Bobach and Matthias Irmer Creating a Gold Standard Corpus for the Extraction of

Chemistry-Disease Relations from Patent Texts

590 Felice Dell'Orletta, Giulia Venturi, Andrea Cimino and

Simonetta Montemagni

T2K^2: a System for Automatically Extracting and Organizing

Knowledge from Texts

764 Johannes Kirschnick, Alan Akbik and Holmer Hemsen Freepal: A Large Collection of Deep Lexico-Syntactic Patterns

for Relation Extraction

791 Marc Poch, Núria Bel, Sergio Espeja and Felipe Navio Ranking Job Offers for Candidates: learning hidden

knowledge from Big Data

913 Paul Buitelaar, Georgeta Bordea and Barry Coughlan Hot Topics and Schisms in NLP: Community and Trend

Analysis with Saffron on ACL and LREC Proceedings

1009 Andre Blessing and Jonas Kuhn Textual Emigration Analysis (TEA)

Session : P29 - Lexicons Chair: Nianwen Xue

4 Tristan Miller and Iryna Gurevych WordNet–Wikipedia–Wiktionary: Construction of a Three-

way Alignment

248 Lei Zhang, Michael Färber and Achim Rettinger xLiD-Lexica: Cross-lingual Linked Data Lexica

316 Begum Erten, Cem Bozsahin and Deniz Zeyrek Turkish Resources for Visual Word Recognition

339 Martin Jansche Computer-Aided Quality Assurance of an Icelandic

Pronunciation Dictionary

397 Lars Borin, Jens Allwood and Gerard de Melo Bring vs. MTRoget: Evaluating automatic thesaurus

translation

417 Wushouer Mairidan, Toru Ishida, Donghui Lin and Katsutoshi

Hirayama

Bilingual Dictionary Induction as an Optimization Problem

563 Tommaso Caselli, Laure Vieu, Carlo Strapparava and Guido

Vetere

Enriching the "Senso Comune" Platform with Automatically

Acquired Data

588 Sameh Alansary MUHIT: A Multilingual Harmonized Dictionary

604 Aurelie Neveol, Julien Grosjean, Stéfan Darmoni and Pierre

Zweigenbaum

Language Resources for French in the Biomedical Domain

1021 Pyry Takala, Pekka Malo, Ankur Sinha and Oskar Ahlgren Gold-standard for Topic-specific Sentiment Analysis of

Economic Texts

Session : P30 - Large

Projects and Infrastructural

Issues

Chair: Yohei Murakami

31 Peter Spyns and Remco van Veenendaal A decade of HLT Agency activities in the Low Countries: from

resource maintenance (BLARK) to service offerings (BLAISE)

410 Koenraad De Smedt, Erhard Hinrichs, Detmar Meurers,

Inguna Skadina, Bolette Pedersen, Costanza Navarretta,

Núria Bel, Krister Linden, Marketa Lopatkova, Jan Hajic, Gisle

Andersen and Przemyslaw Lenkiewicz

CLARA: A New Generation of Researchers in Common

Language Resources and Their Applications

814 Lina Henriksen, Dorte Haltrup Hansen, Bente Maegaard,

Bolette Sandford Pedersen and Claus Povlsen

Encompassing a spectrum of LT users in the CLARIN-DK

Infrastructure

452 Maarten Truyens and Patrick Van Eecke Legal aspects of text mining

459 Jan Odijk CLARIN-NL: Major results

795 Auður Hauksdóttir An Innovative World Language Centre : Challenges for the

Use of Language Technology

945 Joseph Mariani, Christopher Cieri, Gil Francopoulo, Patrick

Paroubek and Marine Delaborde

Facing the Identification Problem in Language-Related

Scientific Data Analysis

983 Frank Landsbergen, Carole Tiberius and Roderik Dernison Taalportaal: an online grammar of Dutch and Frisian

Session : P31 - Opinion

Mining and Reviews Analysis

Chair: Manfred Stede

TBC

85 Roman Klinger and Philipp Cimiano The USAGE review corpus for fine grained multi lingual

opinion analysis

258 Christian Haenig, Andreas Niekler and Carsten Wuensch PACE Corpus: a multilingual corpus of Polarity-annotated

textual data from the domains Automotive and CEllphone

293 Patrik Lambert and Carlos Rodriguez-Penagos Adapting Freely Available Resources to Build an Opinion

Mining Pipeline in Portuguese

350 Roser Saurí, Judith Domingo and Toni Badia The NewSoMe Corpus: A Unifying Opinion Annotation

Framework across Genres and in Multiple Languages

356 André Bittar, Luca Dini, Sigrid Maurel and Mathieu Ruhlmann The Dangerous Myth of the Star System

1001 Wiltrud Kessler and Jonas Kuhn A Corpus of Comparisons in Product Reviews

Session : P32 - Social Media

Processing

Chair: Fei Xia

1116 Clare Voss, Stephen Tratz, Jamal Laoudi and Douglas Briesch Finding Romanized Arabic Dialect in Code-Mixed Tweets

53 Fabrizio Gotti, Phillippe Langlais and Atefeh Farzindar Hashtag Occurrences, Layout and Translation: A Corpus-

driven Analysis of Tweets Published by the Canadian

Government

83 Guoyu Tang, Yunqing Xia, Weizhi Wang, Raymond Lau and

Fang Zheng

Clustering tweets usingWikipedia concepts

317 Eshrag Refaee and Verena Rieser An Arabic Twitter Corpus for Subjectivity and Sentiment

Analysis

442 Iñaki Alegria, Nora Aranberri, Pere Comas, Victor Fresno,

Pablo Gamallo, Lluís Padró, Iñaki San Vicente, Jordi Turmo

and Arkaitz Zubiaga

TweetNorm_es: an annotated corpus for Spanish microtext

normalization

834 Nikola Ljubešić, Darja Fišer and Tomaž Erjavec TweetCaT: a tool for building Twitter corpora of smaller

languages

1146 Tatjana Scheffler A German Twitter Snapshot

Session : P33 - Treebanks Chair: Montserrat

Marimón

444 Elżbieta Hajnicz The Procedure of Lexico-Semantic Annotation of Składnica

Treebank

494 Marie Candito, Guy Perrier, Bruno Guillaume, Corentin

Ribeyre, Karën Fort, Djamé Seddah and Eric de la Clergerie

Deep Syntax Annotation of the Sequoia French Treebank

538 Alina Wróblewska and Adam Przepiórkowski Projection-based Annotation of a Polish Dependency

Treebank

694 Željko Agić, Daša Berović, Danijela Merkler and Marko Tadić Croatian Dependency Treebank 2.0: New Annotation

Guidelines for Improved Parsing

766 Rachel Bawden, Marie-Amélie Botalla, Kim Gerdes and

Sylvain Kahane

Correcting and Validating Syntactic Dependency in the

Spoken French Treebank Rhapsodie

860 Kilian A. Foth, Arne Köhn, Niels Beuck and Wolfgang Menzel Because Size Does Matter: The Hamburg Dependency

Treebank

915 Rudolf Rosa, Jan Mašek, David Mareček, Martin Popel, Daniel

Zeman and Zdeněk Žabokrtský

HamleDT 2.0: Thirty Dependency Treebanks Stanfordized

995 Munshi Asadullah, Patrick Paroubek and Anne Vilnat Bidirectionnal converter between syntactic annotations :

from French Treebank Dependencies to PASSAGE

annotations, and back

1145 Mohamed Maamouri, Ann Bies, Seth Kulick, Michael Ciul,

Nizar Habash and Ramy Eskander

Developing an Egyptian Arabic Treebank: Impact of Dialectal

Morphology on Annotation and Tool Development



Annotation

Chair: Zygmunt Vetulani

TBC

219 Inès Zribi, Rahma Boujelbane, Abir Masmoudi, Mariem

Ellouze, Lamia Belguith and Nizar Habash

A Conventional Orthography for Tunisian Arabic

956 Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Ossama

Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah

Alkuhlani and Kemal Oflazer

Large Scale Arabic Error Annotation: Guidelines and

Framework

763 Shinsuke Mori, Hirokuni Maeta, Yoko Yamakata and Tetsuro

Sasada

Flow Graph Corpus from Recipe Texts

842 Marc Kupietz and Harald Lüngen Recent Developments in DeReKo

843 Shu-Kai Hsieh Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is

Killing Chinese Corpus Linguistics

849 Jannik Strötgen, Thomas Bögel, Julian Zell, Ayser Armiti, Tran

Van Canh and Michael Gertz

Extending HeidelTime for Temporal Expressions Referring to

Historic Dates

852 Thomas Eckart, Erla Hallsteinsdóttir, Sigrún Helgadóttir, Uwe

Quasthoff and Dirk Goldhahn

A 500 Million Word POS-Tagged Icelandic Corpus

916 Shan Wang and Francis Bond Building The Sense-Tagged Multilingual Parallel Corpus

922 Anik Dey and Pascale Fung A Hindi-English Code-Switching Corpus

934 Andrea Abel, Aivars Glaznieks, Lionel Nicolas and Egon Stemle KoKo: an L1 Learner Corpus for German

1000 Vasile Rus, Rajendra Banjade and Mihai Lintean On Paraphrase Identification Corpora

1070 Anne Garcia-Fernandez, Anne-Laure Ligozat and Anne Vilnat Construction and Annotation of a French Folkstale Corpus

1226 Shyam Sundar Agrawal, Abhimanue, Shweta Bansal and

Minakshi Mahajan

Statistical Analysis of Multilingual Text Corpus and

Development of Language Models

1037 Vanessa Loza, Shibamouli Lahiri, Rada Mihalcea and Po-

Hsiang Lai

Building a Dataset for Summarization and Keyword

Extraction from Emails

Session : P35 - Grammar

and Syntax

Chair: Tamás Váradi

639 Emily M. Bender Language CoLLAGE: Grammatical Description with the LinGO

Grammar Matrix

773 Anna Vernerová, Václava Kettnerová and Marketa Lopatkova To Pay or to Get Paid: Enriching a Valency Lexicon with

Diatheses

1060 Georgios Petasis The Ellogon Pattern Engine: Context-free Grammars over

Annotations

1079 Dana Dannells and Normunds Gruzitis Extracting a bilingual semantic grammar from FrameNet-

annotated corpora

1149 Kyoko Ohara Relating Frames and Constructions in Japanese FrameNet

1179 Lars Hellan, Dorothee Beermann, Tore Bruland, Mary Esther

Kropp Dakubu and Montserrat Marimon

MultiVal - towards a multilingual valence lexicon

1214 Emanuele Di Buccio, Giorgio Maria Di Nunzio and Gianmaria

Silvello

A Vector Space Model for Syntactic Distances Between

Dialects

349 Jana Sindlerova, Zdenka Uresova and Eva Fucikova Resources in Conflict: A Bilingual Valency Lexicon vs. a

Bilingual Treebank vs. a Linguistic Theory

Session : P36 - Metaphors Chair: Walter Daelemans

TBC

241 Samira Shaikh, Tomek Strzalkowski, Ting Liu, George Aaron

Broadwell, Boris Yamrom, Sarah Taylor, Laurie Feldman, Kit

Cho, Umit Boz, Ignacio Cases, Yuliya Peshkova and Ching-

Sheng Lin

A Multi-Cultural Repository of Automatically Discovered

Linguistic and Conceptual Metaphors

419 Brian MacWhinney and Davida Fromm Two Approaches to Metaphor Detection

737 Andrew Gargett and John Barnden Mining Online Discussion Forums for Metaphors

Session : P37 - Named

Entity Recognition

Chair: German Rigau

186 Kareem Darwish and Wei Gao Simple Effective Microblog Named Entity Recognition: Arabic

as an Example

236 Cyril Grouin Biomedical entity extraction using machine-learning based

approaches

276 Darina Benikova, Chris Biemann and Marc Reznicek NoSta-D Named Entity Annotation for German: Guidelines

and Dataset

358 Haibo Li, Masato Hagiwara, Qi Li and Heng Ji Comparison of the Impact of Word Segmentation on Name

Tagging for Chinese and Japanese

391 Dimitrios Kokkinakis, Jyrki Niemi, Sam Hardwick, Krister

Lindén and Lars Borin

HFST-SweNER – A New NER Resource for Swedish

421 Hege Fromreide, Dirk Hovy and Anders Søgaard Crowdsourcing and annotating NER for Twitter #drift

468 Guillaume Jacquet, Maud Ehrmann and Ralf Steinberger Clustering of Multi-Word Named Entity variants: Multilingual

Evaluation

513 Daniela Amaral, Evandro Fonseca, Lucelene Lopes and

Renata Vieira

Comparative Analysis of Portuguese Named Entities

Recognition Tools

549 Cédric Lopez, Frédérique Segond, Olivier Hondermarck,

Paolo Curtoni and Luca Dini

Generating a Resource for Products and Brandnames

Recognition. Application to the Cosmetic Domain

688 Younggyun Hahm, Jungyeul Park, Kyungtae Lim, Youngsik

Kim, Dosam Hwang and Key-Sun Choi

Named Entity Corpus Construction using Wikipedia and

DBpedia Ontology

865 Andrea Glaser and Jonas Kuhn Exploring the utility of coreference chains for improved

identification of personal names

967 Joachim Bingel and Thomas Haider Named Entity Tagging a Very Large Unbalanced Corpus:

Training and Evaluating NE Classifiers

Session : P38 - Question

Answering

Chair: António Branco

12 Peter Exner and Pierre Nugues REFRACTIVE: An Open Source Tool to Extract Knowledge

from Syntactic and Semantic Relations

74 Akira Fujita, Akihiro Kameda, Ai Kawazoe and Yusuke Miyao Overview of Todai Robot Project and Evaluation Framework

of its NLP-based Problem Solving

124 Kirk Roberts, Kate Masterton, Marcelo Fiszman, Halil Kilicoglu

and Dina Demner-Fushman

Annotating Question Decomposition on Complex Medical

Questions

130 Sérgio Curto, Ana C. Mendes, Pedro Curto, Luísa Coheur and

Angela Costa

JUST.ASK, a QA system that learns to answer new questions

from previous interactions

271 Kugatsu Sadamitsu, Ryuichiro Higashinaka and Yoshihiro

Matsuo

Extraction of Daily Changing Words for Question Answering

902 Artem Ostankov, Florian Röhrbein and Ulli Waltinger LinkedHealthAnswers: Towards Linked Data-driven Question

Answering for the Health Care Domain

990 Axel-Cyrille Ngonga Ngomo, Norman Heino, René Speck and

Prodromos Malakasiotis

A tool suite for creating question answering benchmarks


Resources

Chair: Henk van den

Heuvel

650 Luca Cristoforetti, Mirco Ravanelli, Maurizio Omologo,

Alessandro Sosi, Alberto Abad, Martin Hagmueller and Petros

Maragos

The DIRHA simulated corpus

695 Roberto Gretter Euronews: a multilingual speech corpus for ASR

709 Sakriani Sakti, Keigo Kubo, Sho Matsumiya, Graham Neubig,

Tomoki Toda, Satoshi Nakamura, Fumihiro Adachi and

Ryosuke Isotani

Towards Multilingual Conversations in the Medical Domain:

Development of Multilingual Medical Data and A Network-

based ASR System

710 Andrej Zgank, Ana Zwitter Vitez and Darinka Verdonik The Slovene BNSI Broadcast News database and reference

speech corpus GOS: Towards the uniform guidelines for

future work

719 Jan Gorisch, Corine Astésano, Ellen, Gurman Bard, Brigitte

Bigi and Laurent Prévot

Aix Map Task corpus: The French multimodal corpus of task-

oriented dialogue

739 Carmen Garcia-Mateo, Antonio Cardenal, Xose Luis Regueira,

Elisa Fernández Rei, Marta Martinez, Roberto Seara, Rocío

Varela and Noemí Basanta

CORILGA: a Galician Multilevel Annotated Speech Corpus for

Linguistic Analysis

744 Igor Odriozola, Inma Hernaez, María Inés Torres, Luis Javier

Rodriguez-Fuentes, Mikel Penagarikano and Eva Navas

Basque Speecon-like and Basque SpeechDat MDB-600:

speech databases for the development of ASR technology for

Basque

799 David Tavarez, Eva Navas, Daniel Erro, Ibon Saratxaga and

Inma Hernaez

New bilingual speech databases for audio diarization

748 Tobias Bocklet, Andreas Maier, Korbinian Riedhammer,

Ulrich Eysholdt and Elmar Nöth

Erlangen-CLP: A Large Annotated Corpus of Speech from

Children with Cleft Lip and Palate

789 Evgeny Stepanov, Giuseppe Riccardi and Ali Orkan Bayer The Development of the Multilingual LUNA Corpus for

Spoken Language System Porting


Session : P40 - Lexicons Chair: Yoshihiko Hayashi

602 Bruno Guillaume, Karën Fort, Guy Perrier and Paul Bédaride Mapping the Lexique des Verbes du Francais (Lexicon of

French Verbs) to a NLP lexicon using examples

633 Satoshi Sato Text Readability and Word Distribution in Japanese

657 Uwe Quasthoff, Dirk Goldhahn, Thomas Eckart, Erla

Hallsteinsdóttir and Sabine Fiedler

High Quality Word Lists as a Resource for Multiple Purposes

672 Þórdís Úlfarsdóttir ISLEX – a Multilingual Web Dictionary

704 Eduard Bejček, Kettnerová Václava and Marketa Lopatkova Automatic Mapping Lexical Resources: A Lexical Unit as the

Keystone

753 Cédric Lopez, Reda Bestandji, Mathieu Roche and Rachel

Panckhurst

Towards Electronic SMS Dictionary Construction: An

Alignment-based Approach

803 Ahmet Aker, Monica Paramita, Marcis Pinnis and Robert

Gaizauskas

Bilingual dictionaries for all EU languages

808 Janine Pimentel Adding a Third Language to a Lexical Resource Describing

Legal Terminology: the assignment of equivalents

844 Tafseer Ahmed Khan Automatic acquisition of Urdu nouns (along with gender and

irregular plurals)

1031 Valeria de Paiva, Livy Real, Alexandre Rademaker and Gerard

de Melo

NomLex-PT: A Lexicon of Portuguese Nominalizations

Session : P41 - Parsing Chair: Simonetta

Montemagni

60 Hen-Hsen Huang, Huan-Yuan Chen, Chang-Sheng Yu, Hsin-Hsi

Chen, Po-Ching Lee and Chun-Hsun Chen

Sentence Rephrasing for Parsing Sentences with OOV Words

62 Cheikh M. Bamba Dione Pruning the Search Space of the Wolof LFG Grammar Using a

Probabilistic and a Constraint Grammar Parser

73 Elena Mitocariu, Daniel-Alexandru Anechitei, Dan Cristea How Could Veins Speed Up The Process Of Discourse Parsing

239 Achim Stein Parsing Heterogeneous Corpora with a Rich Dependency

Grammar

453 Angelina Ivanova and Gertjan van Noord Treelet Probabilities for HPSG Parsing and Error Correction

543 Arda Celebi and Arzucan Özgür Self-training a Constituency Parser using n-gram Trees

1089 Natalia Silveira, Timothy Dozat, Marie-Catherine de

Marneffe, Samuel Bowman, Miriam Connor, John Bauer and

Chris Manning

A Gold Standard Dependency Corpus for English

230 Wolfgang Maier, Miriam Kaeshammer, Peter Baumann and

Sandra Kübler

Discosuite - A parser test suite for German discontinuous

structures

Session : P42 - Part-of-

Speech Tagging

Chair: Krister Linden

510 Timur Gilmanov, Olga Scrivner and Sandra Kübler SWIFT Aligner, A Multifunctional Tool for Parallel Corpora:

Visualization, Word Alignment, and (Morpho)-Syntactic Cross-

Language Transfer

275 Saba Urooj, Sarmad Hussain, Asad Mustafa, Rahila Parveen,

Farah Adeeba, Tafseer Ahmed Khan, Miriam Butt and

Annette Hautli

The CLE Urdu POS Tagset

335 Kareem Darwish, Ahmed Abdelali and Hamdy Mubarak Using Stem-Templates to Improve Arabic POS and

Gender/Number Tagging

362 Gaël de Chalendar The LIMA Multilingual Analyzer Made Free: FLOSS Resources

Adaptation and Correction

544 Bushra Jawaid, Amir Kamran and Ondrej Bojar A Tagged Corpus and a Tagger for Urdu

677 Sigrún Helgadóttir, Hrafn Loftsson and Eiríkur Rögnvaldsson Correcting Errors in a New Gold Standard for Tagging

Icelandic Text

1018 Łukasz Kobyliński PoliTa: A multitagger for Polish

Session : P43 - Semantics Chair: Marc Verhagen

556 Francesca Frontini, Valeria Quochi, Sebastian Padó, Monica

Monachini and Jason Utt

Polysemy Index for Nouns: an Experiment on Italian using the

PAROLE SIMPLE CLIPS Lexical Database

619 Muntsa Padró, Marco Idiart, Aline Villavicencio and Carlos

Ramisch

Comparing Similarity Measures for Distributional Thesauri

643 Elisa Omodei, Jean-Philippe Cointet and Thierry Poibeau Reconstructing the Semantic Landscape of Natural Language

Processing

754 Olivier Ferret Compounds and distributional thesauri

823 Kyle Richardson and Jonas Kuhn UnixMan Corpus: A Resource for Language Learning in the

Unix Domain

866 Tatiana Erekhinskaya, Meghana Satpute and Dan Moldovan Multilingual eXtended WordNet Knowledge Base: Semantic

Parsing and Translation of Glosses

867 Manel Zarrouk and Mathieu Lafourcade Relation Inference in Lexical Networks ... with Refinements

900 Raquel Amaro Extracting semantic relations from Portuguese corpora using

lexical-syntactic patterns

904 David Jurgens An analysis of ambiguity in word sense annotations

1012 Claire Bonial, Julia Bonn, Kathryn Conger, Jena D. Hwang and

Martha Palmer

PropBank: Semantics of New Predicate Types

1043 Michael Mohler, Marc Tomlinson, David Bracewell and Bryan

Rink

Semi-supervised methods for expanding psycholinguistics

norms by integrating distributional similarity with the

structure of WordNet

1150 Gemma Bel Enguix, Reinhard Rapp and Michael Zock A Graph-Based Approach for Computing Free Word

Associations

1168 Martin Gleize and Brigitte Grau A hierarchical taxonomy for classifying hardness of inference

tasks


Recognition and Synthesis

Chair: Denise Di Persio

196 Joris Pelemans, Kris Demuynck, Hugo Van hamme and Patrick

Wambacq

Speech Recognition Web Services for Dutch

383 Maria Goryainova, Cyril Grouin, Sophie Rosset and Ioana

Vasilescu

Morpho-Syntactic Study of Errors from Speech Recognition

System

771 Daniel Luzzati, Cyril Grouin, Ioana Vasilescu, Martine Adda-

Decker, Eric Bilinski, Nathalie Camelin, Juliette Kahn, Carole

Lailler, Lori Lamel and Sophie Rosset

Human annotation of ASR error regions: Is "gravity" a

sharable concept for human annotators?

430 Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman

Mustafawi

Development of a TV Broadcasts Speech Recognition System

for Qatari Arabic

434 Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman

Mustafawi

Automatic Long Audio Alignment and Confidence Scoring for

Conversational Arabic Speech

533 Giampiero Salvi and Niklas Vanhainen The WaveSurfer Automatic Speech Recognition Plugin

715 Matti Varjokallio and Mikko Kurimo A Toolkit for Efficient Learning of Lexical Units for Speech

Recognition

838 Aimilios Chalamandaris, Pirros Tsiakoulis, Sotiris Karabetsos

and Spyros Raptis

Using Audio Books for Training a Text-to-Speech System


Session : P45 - Anaphora

and Coreference

Chair: Costanza

Navarretta

286 Panot Chaimongkol, Akiko Aizawa and Yuka Tateisi Corpus for Coreference Resolution on Scientific Papers

298 Liane Guillou, Christian Hardmeier, Aaron Smith, Jörg

Tiedemann and Bonnie Webber

ParCor 1.0: A Parallel Pronoun-Coreference Corpus to

Support Statistical MT

372 Nobal Niraula, Vasile Rus, Rajendra Banjade, Dan Stefanescu,

William Baggett and Brent Morgan

The DARE Corpus: A Resource for Anaphora Resolution in

Dialogue Based Intelligent Tutoring Systems

726 Christian Girardi, Manuela Speranza, Rachele Sprugnoli and

Sara Tonelli

CROMER: a Tool for Cross-Document Event and Entity

Coreference

729 Arturs Znotins and Peteris Paikens Coreference Resolution for Latvian

850 Nadjet Bouayad-Agha, Alicia Burga, Gerard Casamayor, Joan

Codina, Rogelio Nazar and Leo Wanner

An Exercise in Reuse of Resources: Adapting General

Discourse Coreference Resolution for Detecting Lexical

Chains in Patent Documentation

891 Anders Björkelund, Kerstin Eckart, Arndt Riester, Nadja

Schauffler and Katrin Schweitzer

The Extended DIRNDL Corpus as a Resource for Coreference

and Bridging Resolution

918 Marcos Garcia and Pablo Gamallo Multilingual corpora with coreferential annotation of person

entities

1088 Maciej Ogrodniczuk, Mateusz Kopeć and Agata Savary Polish Coreference Corpus in Numbers


Extraction and Information

Retrieval

Chair: Dimitrios

Kokkinakis

45 Véronique Moriceau and Xavier Tannier French Resources for Extraction and Normalization of

Temporal Expressions with HeidelTime

99 Zdenka Uresova, Jan Hajic, Pavel Pecina and Ondrej Dusek Multilingual Test Sets for Machine Translation of Search

Queries for Cross-Lingual Information Retrieval in the

Medical Domain

106 Huijing Deng and Grzegorz Chrupała Semantic approaches to software component retrieval with

English queries

250 Hong Li, Sebastian Krause, Feiyu Xu, Hans Uszkoreit, Robert

Hummel and Veselina Mironova

Annotating Relation Mentions in Tabloid Press

337 Shaoda He, Xiaojun Zou, Liumingjing Xiao and Junfeng Hu Construction of Diachronic Ontologies from People's Daily of

Fifty Years

389 Maria Evangelia Chatzimina, Cyril Grouin and Pierre

Zweigenbaum

Use of unsupervised word classes for entity recognition:

Application to the detection of disorders in clinical reports

409 Alan Akbik and Thilo Michael The Weltmodell: A Data-Driven Commonsense Knowledge

Base

645 Marieke van Erp, Gleb Satyukov, Piek Vossen and Marit Nijsen Discovering and Visualising Stories in News

1107 Tomohide Shibata, Shotaro Kohama and Sadao Kurohashi A Large Scale Database of Strongly-related Events in Japanese

218 Steven Bethard, Philip Ogren and Lee Becker ClearTK 2.0: Design Patterns for Machine Learning in UIMA


Identification

Chair: Michael Rosner

TBC

435 Dirk Goldhahn and Uwe Quasthoff Vocabulary-Based Language Similarity using Web Corpora

732 Thomas Lavergne, Gilles Adda, Martine Adda-Decker and Lori

Lamel

Automatic language identity tagging on word and sentence-

level in multilingual text sources: a case-study on

Luxembourgish

996 Marcos Zampieri and Binyam Gebre VarClass: An Open-source Language Identification Tool for

Language Varieties

1068 Xiao Jiang, Yufan Guo, Jeroen Geertzen, Dora Alexopoulou,

Lin Sun and Anna Korhonen

Native Language Identification Using Large, Longitudinal Data

1183 Liviu Dinu and Alina Maria Ciobanu On the Romance Languages Mutual Intelligibility

Session : P48 - Morphology Chair: Pavel Smrz TBC

784 Senka Drobac, Krister Lindén, Tommi Pirinen and Miikka

Silfverberg

Heuristic Hyper-minimization of Finite State Lexicons

793 Claudia Borg and Albert Gatt Crowd-sourcing evaluation of automatically acquired,

morphologically related word groupings

896 Patrick Littell, Kaitlyn Price and Lori Levin Morphological parsing of Swahili using crowdsourced lexical

resources

909 Carla Parra Escartín Chasing the Perfect Splitter: A Comparison of Different

Compound Splitting Tools

1003 Vincent Claveau and Ewa Kijak Generating and using probabilistic morphological resources

for the biomedical domain

1051 Peter Baumann and Janet Pierrehumbert Using Resource-Rich Languages to Improve Morphological

Analysis of Under-Resourced Languages

1073 Ozlem Cetinoglu Turkish Treebank as a Gold Standard for Morphological

Disambiguation and Its Influence on Parsing

1074 Krešimir Šojat, Matea Srebačić, Marko Tadić and Tin Pavelić CroDeriV: a new resource for processing Croatian morphology

1090 Jan Šnajder DerivBase.hr: A High-Coverage Derivational Morphology

Resource for Croatian

1207 Jonathan Washington, Ilnar Salimzyanov and Francis Tyers Finite-state morphological transducers for three Kypchak

languages

Session : P49 -

Multimodality

Chair: Volker Steinbiss

TBC

51 Brigitte Bigi, Tatsuya Watanabe and Laurent Prévot Representing Multimodal Linguistic Annotated data

160 Michael Kipp, Levin Freiherr von Hollen, Michael Christopher

Hrstka and Franziska Zamponi

Single-Person and Multi-Party 3D Visualizations for

Nonverbal Communication Analysis

163 Huseyin Cakmak, Jerome Urbain, Thierry Dutoit and Joelle

Tilmanne

The AV-LASYN Database : A synchronous corpus of audio and

3D facial marker data for audio-visual laughter synthesis

189 Vincent Vandeghinste and Ineke Schuurman Linking Pictographs to Synsets: Sclera2Cornetto

192 Dietmar Schabus, Michael Pucher and Phil Hoole The MMASCS multi-modal annotated synchronous corpus of

audio, video, facial motion and tongue motion data of

normal, fast and slow speech

235 Mathieu Chollet, Magalie Ochs and Catherine Pelachaud Mining a multimodal corpus for non-verbal behavior

sequences conveying attitudes

318 Massimo Moneglia, Susan Brown, Francesca Frontini, Gloria

Gagliardi, Fahad Khan, Monica Monachini and Alessandro

Panunzi

The IMAGACT Visual Ontology. An Extendable Multilingual

Infrastructure for the representation of lexical encoding of

Action

354 Kodai Takahashi and Masashi Inoue Multimodal dialogue segmentation with gesture post-

processing

374 Shannon Hennig, Ryad Chellali and Nick Campbell The D-ANS corpus: the Dublin-Autonomous Nervous System

corpus of biosignal and multimodal recordings of

conversational speech


Session : P50 -

Crowdsourcing

Chair: Cristina Vertan

214 Jean-Philippe Goldman, Adrian Leeman, Marie-José Kolly,

Ingrid Hove, Ibrahim Almajai, Volker Dellwo and Steven

Moran

A Crowdsourcing Smartphone Application for Swiss

German: Putting Language Documentation in the

Hands of the Users

738 Theodosia Togia and Ann Copestake TagNText: A parallel corpus for the induction of

resource-specific non-taxonomical relations from

756 Shinsuke Goto, Donghui Lin and Toru Ishida Crowdsourcing for Evaluating Machine Translation

813 George Kiomourtzis, George Giannakopoulos, Georgios

Petasis, Pythagoras Karampiperis and Vangelis Karkaletsis

NOMAD: Linguistic Resources and Tools Aimed at

Policy Formulation and Validation

1106 Darja Fišer, Aleš Tavčar and Tomaž Erjavec sloWCrowd: A crowdsourcing tool for lexicographic

Session : P51 - Emotion

Recognition and Generation

Chair: Patrick Paroubek

322 Maxim Sidorov, Stefan Ultes and Alexander Schmitt Comparison of Gender- and Speaker-adaptive Emotion

341 Maxim Sidorov, Christina Brester, Wolfgang Minker and

Eugene Semenkin

Speech-Based Emotion Recognition: Feature Selection

by Self-Adaptive Multi-Criteria Genetic Algorithm

334 Nesrine Fourati and Catherine Pelachaud Emilya: Emotional body expression in daily actions

377 Juan-María Garrido, Yesika Laplaza, Benjamin Kolz and

Miquel Cornudella

TexAFon 2.0: A text processing tool for the generation

of expressive speech in TTS applications

591 Giovanni Costantini, Iacopo Iaderola, Andrea Paoloni and

Massimiliano Todisco

EMOVO Corpus: an Italian Emotional Speech Database

741 Demulier Virginie, Elisabetta Bevacqua, Florian Focone, Tom

Giraud, Pamela Carreno, Brice Isableu, Sylvie Gibet, Pierre De

Loor and Jean-Claude Martin

A Database of Full Body Virtual Interactions Annotated

with Expressivity Scores

1222 Sophia Lee, Shoushan Li and Chu-Ren Huang Annotating Events in an Emotion Corpus

Session : P52 - Linked Data Chair: John Philip McCrae

703 Tomáš Kliegr and Ondřej Zamazal Towards Linked Hypernyms Dataset 2.0:

complementing DBpedia with hypernym discovery

780 Mohamed Sherif, Sandro Coelho, Ricardo Usbeck, Sebastian

Hellmann, Jens Lehmann, Martin Brümmer and Andreas Both

NIF4OGGD - NLP Interchange Format for Open German

Governmental Data

856 Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel

Gerber and Andreas Both

N³ - A Collection of Datasets for Named Entity

Recognition and Disambiguation in the NLP

788 Riccardo Del Gratta, Gabriella Pardelli and Sara Goggi The LRE Map disclosed

1052 Clara Bacciu, Angelica Lo Duca, Andrea Marchetti and

Maurizio Tesconi

Accommodations in Tuscany as Linked Data

1182 David Lewis, Rob Brennan, Leroy Finn, Dominic Jones, Alan

Meehan, Declan O'sullivan, Sebastian Hellmann and Felix

Sasaki

Global Intelligent Content: Active Curation of Language

Resources using Linked Data


Translation

Chair: Mikel Forcada

835 Ondrej Bojar, Vojtěch Diatka, Pavel Rychlý, Pavel Stranak, Vit

Suchomel, Aleš Tamchyna and Daniel Zeman

HindEnCorp - Hindi-English and Hindi-only Corpus for

Machine Translation

848 Mara Chinea-Rios, Germán Sanchis Trilles, Daniel Daniel

Ortiz-Martínez and Francisco Casacuberta

Online optimisation of log-linear weights in interactive

machine translation

964 Kashif Shah, Marco Turchi and Lucia Specia An efficient and user-friendly tool for machine

translation quality estimation

982 Santanu Pal, Sudip Kumar Naskar and Sivaji Bandyopadhyay Word Alignment-Based Reordering of Source Chunks in

PB-SMT

1095 Bruno Laranjeira, Viviane Moreira, Aline Villavicencio, Carlos

Ramisch and Maria José Finatto

Comparing the Quality of Focused Crawlers and of the

Translation Resources Obtained from them

1097 Christian Buck, Kenneth Heafield and Bas van Ooyen N-gram Counts and Language Models from the

Common Crawl

1115 Guillaume Wisniewski, Natalie Kübler and François Yvon A Corpus of Machine Translation Errors Extracted from

Translation Students Exercises

1213 Alexandru Ceausu and Sabine Hunsicker Pre-ordering of phrase-based machine translation

input in translation workflow

1217 Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar,

Benjamin Van Durme and Matt Post

A Wikipedia-based Corpus for Contextualized Machine

Translation

Session : P54 -

Multimodality

Chair: Kristiina Jokinen

TBC

525 Costanza Navarretta and Magdalena Lis Transfer learning of feedback head expressions in

Danish and Polish comparable multimodal corpora

567 Onno Crasborn and Han Sloetjes Improving the exploitation of linguistic annotations in

627 Yoshihiko Hayashi Web-imageability of the Behavioral Features of Basic-

689 Zoraida Callejas, Brian Ravenet, Magalie Ochs and Catherine

Pelachaud

A model to generate adaptive multimodal job

interviews with a virtual recruiter

747 Coline Claude-Lachenaud, Eric Charton, Benoit Ozell and

Michel Gagnon

A multimodal interpreter for 3D visualization and

animation of verbal concepts

912 Philippe Martin New functions for a multipurpose multimodal tool for

phonetic and linguistic analysis of very large speech

947 Mariette Soury and Laurence Devillers Smile and Laughter in Human-Machine Interaction: a

study of engagement

DAY3 Poster Session

1017 Hendrik Buschmeier, Zofia Malisz, Joanna Skubisz, Marcin

Wlodarczak, Ipke Wachsmuth, Stefan Kopp, Petra Wagner

ALICO: a multimodal corpus for the study of active

listening

1053 Przemyslaw Lenkiewicz, Olha Shkaravska, Twan Goosen,

Daan Broeder, Menzo Windhouwer, Stephanie Roth and Olof

Olsson

The DWAN framework: Application of a web

annotation framework for the general humanities to

the domain of language resources

1119 Nicolas Auguin and Pascale Fung Co-Training for Classification of Live or Studio Music

Session : P55 - Ontologies Chair: Monica Monachini

251 Chetana Gavankar, Ashish Kulkarni and Ganesh

Ramakrishnan

Efficient Reuse of Structured and Unstructured

Resources for Ontology Population

686 Maria Pia di Buono and Mario Monteleone From Natural Language to Ontology Population in the

Cultural Heritage Domain. A Computational Linguistics-

781 Alessio Bosca, Matteo Casu, Matteo Dragoni and Nikolaos

Marianos

A Gold Standard for CLIR evaluation in the Organic

Agriculture Domain

851 Bernardo Severo, Cassia Trojahn and Renata Vieira VOAR: A Visual and Integrated Ontology Alignment



Annotation

Chair: Tomaž Erjavec TBC

1023 Goran Glavaš, Jan Šnajder, Marie-Francine Moens and Parisa

Kordjamshidi

HiEve: A Corpus for Extracting Event Hierarchies from

News Stories

1075 Masaya Yamaguchi Building a Database of Japanese Adjective Examples

from Special Purpose Web Corpora

1134 Antonio Toral TLAXCALA: a multilingual corpus of independent news

1143 Nathan Green and Septina Dian Larasati Votter Corpus: A Corpus of Social Polling Language

1151 Roald Eiselen and Martin Puttkammer Developing Text Resources for Ten South African

1153 Paul Felt, Robbie Haertel, Eric Ringger and Kevin Seppi Momresp: A Bayesian Model for Multi-Annotator

Document Labeling

1211 Maciej Ogrodniczuk and Mateusz Kopeć The Polish Summaries Corpus


Extraction and Information

Retrieval

Chair: Feiyu Xu

980 Clément de Groc and Xavier Tannier Evaluating Web-as-corpus Topical Document Retrieval

with an Index of the OpenDirectory

1038 Jordan Schmidek and Denilson Barbosa Improving Open Relation Extraction via Sentence Re-

1058 Pavel Smrz and Jan Kouril Semantic Search in Documents Enriched by LOD-based

1103 Antske Fokkens, Serge Ter Braake, Niels Ockeloen, Piek

Vossen, Susan Legêne and Guus Schreiber

BiographyNet: Methodological Issues when NLP

supports historical research

1156 Tilia Ellendorff, Fabio Rinaldi and Simon Clematide Using Large Biomedical Databases as Gold Annotations

for Automatic Relation Extraction

1170 Yutaka Mitsuishi, Vit Novacek and Pierre-Yves Vandenbussche A Method for Building Burst-Annotated Co-Occurrence

Networks for Analysing Trends in Textual Data

Session : P58 - Lexicons Chair: Kiril Simov

1067 Antonio San Martín and Marie-Claude L' Homme Definition patterns for predicative terms in specialized

lexical resources

1099 Tim vor der Brück, Alexander Mehler and Zahurul Islam ColLex.en: Automatically Generating and Evaluating a

Full-form Lexicon for English

1105 Ajay Dubey, Parth Gupta, Vasudeva Varma and Paolo Rosso Enrichment of Bilingual Dictionary through News

Stream Data

1108 Thomas Francois, Nùria Gala, Patrick Watrin and Cédrick

Fairon

FLELex: a graded lexical resource for French foreign

learners

1155 Anabela Barreiro, Fernando Batista, Ricardo Ribeiro, Helena

Moniz and Isabel Trancoso

OpenLogos Semantico-Syntactic Knowledge-Rich

Bilingual Dictionaries

1161 Mona Diab, Mohamed AlBadrashiny, Maryam Aminian,

Mohammed Attia, Heba Elfardy, Nizar Habash and Abdelati

Hawwari

Towards Compiling a large scale three-way Egyptian

Arabic Dictionary

1169 Michael Rosner and Kurt Sultana Automatic Methods for the Extension of a Bilingual

Dictionary using Comparable Corpora

1203 Kevin Black, Eric Ringger, Paul Felt, Kevin Seppi, Kristian Heal

and Deryle Lonsdale

Evaluating Lemmatization Models for Machine-Assisted

Corpus-Dictionary Linkage


Resource Infrastructures

Chair: Martin Wynne

396 Menzo Windhouwer and Ineke Schuurman Linguistic resources and cats: how to use ISOcat, RELcat

and SCHEMAcat

660 Lluís Padró, Zeljko Agic, Xavier Carreras, Blaz Fortuna,

Esteban García-Cuesta, Zhixing Li, Tadej Stajner and Marko

Tadić

Language Processing Infrastructure in the XLike Project

743 Piotr Banski, Nils Diewald, Michael Hanl, Marc Kupietz and

Andreas Witt

Access control by query rewriting: the case of KorAP

775 Rodrigo Agerri, Josu Bermudez and German Rigau IXA pipeline: Efficient and Ready to Use Multilingual

930 Trang Mai Xuan, Yohei Murakami, Donghui Lin and Toru

Ishida

Integration of Workflow and Pipeline for Language

Service Composition

1086 Rafal Rak, Jacob Carter, Andrew Rowley, Riza Theresa Batista-

Navarro and Sophia Ananiadou

Interoperability and Customisation of Annotation

Schemata in Argo

Session : P60 - Metadata Chair: Gil Francopoulo

979 Penny Labropoulou, Christopher Cieri and Maria Gavrilidou Developing a Framework for Describing Relations

among Language Resources

1011 Thorsten Trippel, Daan Broeder, Matej Durco and Oddrun

Ohren

Towards automatic quality assessment of component

metadata

Session : P61 - Opinion

Mining and Sentiment

Analysis

Chair: Gerard de Melo

TBC

188 Chantal van Son, Marieke van Erp, Antske Fokkens and Piek

Vossen

Hope and Fear: How Opinions Influence Factuality

413 Nathan Hartmann, Lucas Avanço, Pedro Balage, Magali

Duran, Maria das Graças Volpe Nunes, Thiago Pardo and

Sandra Aluísio

A Large Corpus of Product Reviews in Portuguese:

Tackling Out-Of-Vocabulary Words

500 Thierry Declerck and Hans-Ulrich Krieger Harmonization of German Lexical Resources for

617 Anne Garcia-Fernandez, Olivier Ferret and Marco Dinarelli Evaluation of different strategies for domain

adaptation in opinion mining

1010 Amel Fraisse and Patrick Paroubek Toward a unifying model for Opinion, Sentiment and

Emotion information extraction


Resources

Chair: Christoph Draxler

858 Michael Stadtschnitzer, Jochen Schwenninger, Daniel Stein

and Joachim Koehler

Exploiting the large-scale German Broadcast Corpus to

boost the Fraunhofer IAIS Speech Recognition System

889 Ilaine Wang, Sylvain Kahane and Isabelle Tellier Macrosyntactic Segmenters of a French Spoken Corpus

906 Iolanda Alfano, Francesco Cutugno, Aurelio De Rosa, Claudio

Iacobini, Renata Savy and Miriam Voghera

VOLIP: a corpus of spoken Italian and a virtuous

example of reuse of linguistic resources

929 George Christodoulides, Mathieu Avanzi and Jean-Philippe

Goldman

DisMo: A Morphosyntactic, Disfluency and Multi-Word

Unit Annotator. An Evaluation on a Corpus of French

Spontaneous and Read Speech

1020 Vera Cabarrão, Helena Moniz, Fernando Batista, Ricardo

Ribeiro, Nuno Mamede, Hugo Meinedo, Isabel Trancoso, Ana

Isabel Mata and David Martins de Matos

Revising the annotation of a Broadcast News corpus: a

linguistic approach

1193 Ana Isabel Mata, Helena Moniz, Fernando Batista and Julia

Hirschberg

Teenage and adult speech in school context: building

and processing a corpus of European Portuguese

1056 Arjan van Hessen, Franciska de Jong, Stef Scagliola and Tanja

Petrovic

Croatian Memories

1081 Ines Rehbein, Sören Schalowski and Heike Wiese The KiezDeutsch Korpus (KiDKo) Release 1.0

1104 Anthony Rousseau, Paul Deléglise and Yannick Estève Enhancing the TED-LIUM Corpus with Selected Data for

Language Modeling and More TED Talks

1176 Jan Strunk, Florian Schiel and Frank Seifart Untrained Forced Alignment of Transcriptions and

Audio for Language Documentation Corpora using


Session : P63 - Computer-

Assisted Language Learning

(CALL)

Chair: Keith Miller TBC

101 Xiaoyun Wang, Jinsong Zhang, Masafumi Nishida and Seiichi

Yamamoto

Phoneme Set Design Using English Speech Database by

Japanese for Dialogue-Based English CALL Systems

247 Lianet Sepúlveda Torres, Magali Sanches Duran and Sandra

Aluísio

Generating a Lexicon of Errors in Portuguese to

Support an Error Identification System for Spanish

340 Veronika Vincze, János Zsibrita, Péter Durst and Martina

Katalin Szabó

Automatic Error Detection concerning the Definite and

Indefinite Conjugation in the HunLearner Corpus

570 Gabriele Pallotti, Francesca Frontini, Fabio Affè, Monica

Monachini and Stefania Ferrari

Presenting a system of human-machine interaction for

performing map tasks

857 Valentín Cardeñoso-Payo, César González-Ferreras and David

Escudero

Assessment of Non-native Prosody for Spanish as L2

using quantitative scores and perceptual evaluation

892 Elena Volodina, Ildikó Pilán, Lars Borin and Therese

Lindström Tiedemann

A flexible language learning platform based on

language resources and web services

971 Renlong Ai and Marcela Charfuelan MAT: a tool for L2 pronunciation errors annotation

1126 Chris Hokamp, Rada Mihalcea and Peter Schuelke Modeling Language Proficiency Using Implicit Feedback

Session : P64 - Evaluation

Methodologies

Chair: Kevin Bretonnel

Cohen

960 Mohamed Ben Jannet, Martine Adda-Decker, Olivier

Galibert, Juliette Kahn and Sophie Rosset

ETER : a new metric for the evaluation of hierarchical

named entity recognition

1027 Olivier Galibert, Jeremy Leixa, Gilles Adda, Khalid Choukri

and Guillaume Gravier

The ETAPE speech processing evaluation

998 Achim Rettinger, Lei Zhang, Daša Berović, Danijela Merkler,

Matea Srebačić and Marko Tadić

RECSA: Resource for Evaluating Cross-lingual Semantic

Annotation

1147 Helen Hastie and Anja Belz A Comparative Evaluation Methodology for NLG in

Interactive Systems

1189 Juris Borzovs, Ilze Ilziņa, Iveta Keiša, Mārcis Pinnis and

Andrejs Vasiļjevs

Terminology localization guidelines for the national

scenario

Session : P65 - MultiWord

Expressions and Terms

Chair: Valia Kordoni

706 Kris Heylen, Stephen Bond, Dirk De Hertog De Hertog, Ivan

Vulić and Hendrik Kockaert

TermWise: A CAT-tool with Context-Sensitive

Terminological Support

883 Pollet Samvelian, Pegah Faghiri and Sarra El Ayari Extending the coverage of a MWE database for Persian

CPs exploiting valency alternations

920 Behrang Zadeh and Siegfried Handschuh Evaluation of Technology Term Recognition with

1064 Johannes Hellrich, Simon Clematide, Udo Hahn and Dietrich

Rebholz-Schuhmann

Collaboratively Annotating Multilingual Parallel

Corpora in the Biomedical Domain—some MANTRAs

1184 Anca Dinu, Liviu Dinu and Ionut Sorodoc Aggregation methods for efficient collocation detection

1197 Sandra Antunes and Amália Mendes An evaluation of the role of statistical measures and

frequency for MWE identification

Session : P66 - Parsing Chair: Giuseppe Attardi

596 Weston Feely, Mehdi Manshadi, Robert Frederking and Lori

Levin

The CMU METAL Farsi NLP Approach

696 Masood Ghayoomi, Kiril Simov and Petya Osenova Constituency Parsing of Bulgarian: Word- vs Class-

1005 Kiril Simov, Iliana Simova, Ginka Ivanova, Maria Mateva and

Petya Osenova

A System for Experiments with Dependency Parsers

809 Wolfgang Seeker and Jonas Kuhn An Out-of-Domain Test Suite for Dependency Parsing

879 Lauma Pretkalniņa, Artūrs Znotiņš, Laura Rituma and Didzis

Goško

Dependency parsing representation effects on the

accuracy of semantic applications — an example of an

970 Ophélie Lacroix and Denis Béchet Validation Issues induced by an Automatic Pre-

Annotation Mechanism in the Building of Non-

1158 Jianqiang Ma Automatic Refinement of Syntactic Categories in

Chinese Word Structures

Session : P67 - Part-of-

Speech Tagging

Chair: Daniel Flickinger

687 Stephen Wattam, Paul Rayson, Marc Alexander and Jean

Anderson

Experiences with Parallelisation of an Existing NLP

Pipeline: Tagging Hansard

721 Heike Zinsmeister, Ulrich Heid and Kathrin Beck Adapting a part-of-speech tagset to non-standard text:

The case of STTS

755 Antonio Balvet, Dejan Stosic and Aleksandra MILETIC TALC-sef A Manually-Revised POS-TAgged Literary

Corpus in Serbian, English and French

801 Cristina Sánchez Marco An open source part-of-speech tagger for Norwegian:

Building on existing language resources

826 Antonio Pareja-Lora, Guillermo Cárcamo-Escorza and Alicia

Ballesteros-Calvo

Standardisation and Interoperation of

Morphosyntactic and Syntactic Annotation Tools for

Session : P68 - Tools,

Systems, Applications

Chair: Yota

Georgakopoulou TBC

185 Peter Fankhauser, Jörg Knappen and Elke Teich Exploring and Visualizing Variation in Language

429 Raphael Winkelmann and Georg Raess Introducing a web application for labeling, visualizing

speech and correcting derived speech signals

621 Maha Althobaiti, Udo Kruschwitz and Massimo Poesio AraNLP: a Java-based Library for the Processing of

Arabic Text

640 Silvia Rodríguez Vázquez, Pierrette Bouillon and Anton

Bolfing

Applying Accessibility-Oriented Controlled Language

(CL) Rules to Improve Appropriateness of Text

Alternatives for Images: an Exploratory Study

824 Jonathan Sonntag and Manfred Stede GraPAT: a Tool for Graph Annotations

862 Vincenzo Galatà, Alberto Benin, Piero Cosi, Giuseppe

Riccardo Leone, Giulio Paci, Giacomo Sommavilla and Fabio

Tesser

Discovering the Italian literature: interactive access to

audio indexed text resources

1102 Horacio Saggion Creating Summarization Systems with SUMMA