creating a term base to customize an mt system: reusability of resources and tools from the...

26
Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre for Studies in Lexicology

Upload: monserrat-emory

Post on 01-Apr-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Creating a Term Base to Customize an MT System:

Reusability of Resources and Tools from the Translator’s Point of View

Natalie Kübler

Intercultural Centre for Studies in Lexicology

Page 2: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Objectives Introducing available resources, tools,

and MT in translation training

Testing customisable MT as a time-saving tool for « industrial » translation

Using simple tools and immediately available resources to improve MT translation results

Page 3: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Translation training Post-graduate students in language industry (LI)

and specialised translation (ST): Translation, linguistics, localisation, technical writing Dreamweaver, Catalyst, HTML, XML, SQL, UNIX,

translation memory, etc. Semi-professional: every other week with a

private company in translation or language industry

Corpus linguistics and applications to terminology and translation => project in ST (HOWTO) + LI (analysis + feedback to Systran)

Page 4: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

ExperimentTranslating some yet untranslated Linux

HOWTOs, using a MT system subdomain of computing Highly specialised texts written by computer experts – and not

technical writers – for computer experts Translated by French-speaking computer

experts + Translating computing dictionary entries

Page 5: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Systranet Systran’s on-line customisable service Domain-specific dictionaries User dictionaries:

Mono- or multitarget « advanced » linguistic information

On-line source and target text alignement Words not in any (Systran’s or user’s) dictionary Words in the user’s dictionary

Page 6: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Resources + Tools Headwords + equivalents + linguistic information

On-line technical bilingual glossaries On-line term bases

Comparable and translation technical corpora The Web as a corpus

Term extraction (Terminology Extractor)

Page 7: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Methodology Step one dictionary: extracting term candidates

from text Creating and coding step-one dictionary First translation using the dictionary Step two dictionary: changing and/or adding

linguistic information using Systranet’s alignment and color features + linguistic analysis (feedback)

Step two: until the dictionary is saturated

Page 8: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Web-based HOWTO glossary Several French equivalents boot,root disk= disquettes (d') amorce ou de démarrage, racine

browser= butineur, navigateur, arpenteur buffer=tampon to build= bâtir currently= actuellement feedback=comment contacter l'auteur, retour d'information

A.D.S.L. (noun)=raccordement numérique asymétrique

Page 9: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Step 1:Terminology ExtractorFrench and English dictionariesMorphological analysisStop wordsCollocations: sequence of 2 to 10 words

repeated at least onceNon-wordsConcordances

Page 10: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

TE non-wordsDebian Netscape accelleratePermedia Dennis XFCERedHat Dialogs CorelRgbPath FAQs anoyingServerFlags Howto MicrodoftServerLayour README LinuxXkbLayout XkbModel RealAudioSolaris ISA degredationUI KDE GUIUSB LeftOf IRQsWindowMaker ModulePath NFS

Page 11: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

TE collocationsInternet Gateway 3 { Looking look } at the Network 3IP aliasing 3 name server 4ISA { card cards } 3 Network { Device devices } 4latest version 3 Linux computer 3DHCP Server 15 IP { addresses address } 16Linux gateway 3 Linux box 16modules file 3 card on the Linux box 4scripts / ifcfg 3 DNS { Server servers } 17server will start 3 interface configuration file 3{ Network networking } { Card Cards } 12

Page 12: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

« Le grand dictionnaire terminologique » Looking for French equivalents

ENGLISH FRENCHbuffer mémoire tampon n. f.Syn. Syn. buffer storage tampon n. m. buffer memory mémoire intermédiaire

n. f intermediate memory zone tampon n. f.

Page 13: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

HOWTO translation corpus English source – French translation WALL: Web-based environment

Concordances with perl-like regexp Paragraph alignment

French equivalents lexicogrammatical information semantic classes « statistical » information in the domain

Page 14: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

HOWTOs: equivalentsThe daemon […] listens to all messages on each network

deviceLe démon […] écoute tous les messages sur chacun des

périphériques réseau All the Digital cards will autoprobe for their media Toutes les cartes Digital effectueront la détection

automatique du médiaThe latest source distribution can be FTPed from the directory

ftp…or Mosaiced from http…On peut charger la dernière version sur ftp…et sous Mosaic

depuis http…Called by the kernel when the card posts an interrupt.Appelé par le noyau quand la carte déclenche une interruption

Page 15: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

HOWTOs « semantic classes »

can I run 32-bit video games under dosemu

used to run Linux on a 386/16 MHz (

unless you want your modem to answer the phone

The static SLIP server will answer your modem call

Page 16: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

WebCorpThe web as a corpusConcordances : buffer, run* * * on

Updated information More elements

Page 17: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

buffer me des débordements de  buffer (tampon en

français). Pour com/advisories/bufero.html . Writing  buffer

overflow exploits – a tutorial for de NOP . débordement de  buffer dans le

tas (heap buffer overflow) (buffer overflow) . débordement de  buffer

sous windows (et oui ;-)) --[

Page 18: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Customized dictionary« Advanced » linguistic information, such as:

Part-of-speech information noun, proper noun (product name, country, etc.), verb, adjective,

sentence Morphological information

URL (noun) (plural:URLs) / cache (noun)(masculine) Lexicogrammatical information

access (verb)(noprep)=accéder (verb)(prep:à) Basic semantic information

to run (verb)(context:OS) Unix (noun) (SEMCAT:OS)

Idioms Your mileage may vary (sentence)

Page 19: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Dictionary Sample"AT&T" (company name) auto-dial (noun)=numérotation automatique (noun)automatic number identification (noun)=identification de

l'appelant (noun)based (adjective)(noprep)=architecturé (adjective)

(prep:autour)basic language constructs (noun) (plural)=base de

construction du langage (noun) (singular)to log in (verb)=se loger (verb) to introduce (verb) (context:extensions)=introduireto carry (verb)(context:digital data)=transmettre (verb)

Page 20: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

With Step-one dictionary

This page contains a simple cookbook for setting up Red Hat 6.X as an internet gateway for a home network or small office network.

Cette page contient un cookbook simple pour le chapeau rouge 6.X d'établissement en tant que Gateway d'Internet pour un réseau à la maison ou le petit réseau de bureau.

Page 21: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

With Step-two dictionary

This page contains a simple cookbook for setting up Red Hat 6.X as an internet gateway for a home network or small office network.

Cette page contient des recettes simples pour installer Red Hat 6.X en tant que passerelle Internet pour un réseau domestique ou un petit réseau de bureau.

Page 22: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Error typology Morphosyntax: subject-verb or noun-adjective

agreement Syntax:

POS ambiguïty NP: determiners, NP coordination transformations/ellipsis/cleft sentences/PP

attachment Metacharacters « Bugs »

Page 23: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Error examples (1) I am not going *je n'vais pas => je ne vais pas the phase of the light through it*la phase du dépassement léger par lui=> la phase de la lumière qui les traverse. decoded by specific individuals.*décodée par les individus spécifiques.décodée par des individus spécifiques. A cable or ADSL connection*un câble ou une connexion d’AADSLUne connexion par câble ou ADSL

Page 24: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Error examples (2)

When a user picks or is assigned a password, it is encoded with a randomly generated value called the salt.

=> *Quand un utilisateur sélectionne ou est généré un mot de passe, il est codé avec une valeur aléatoirement produite appelée le sel.

Page 25: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

Conclusion Translation results can be significantly

improved by creating customised dictionaries The tools mentionned here are user-friendly But, it implies much work in the beginning +

translators must have a training in linguistics and basic NLP.

Change of attitude towards MT + various tools, especially in the language industry oriented option

Page 26: Creating a Term Base to Customize an MT System: Reusability of Resources and Tools from the Translator’s Point of View Natalie Kübler Intercultural Centre

More things to be done..Merging all dictionaries together into a

« Systranet term base »Translating more HOWTOsProject with Systran: improve user

coding…