flavius meeting – wp4

23
FLAVIUS Meeting – WP4 June 8, 2010 Giurgiu Bogdan Wong William

Upload: lyle

Post on 24-Feb-2016

55 views

Category:

Documents


0 download

DESCRIPTION

FLAVIUS Meeting – WP4. June 8, 2010. Giurgiu Bogdan Wong William. Agenda. LW contributions Keys to successful integration Complete integration picture Translation REST API Trustscore ™ and Reporting REST API Version 2 Customization through dictionaries Customization through training - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: FLAVIUS Meeting – WP4

FLAVIUS Meeting – WP4June 8, 2010

Giurgiu BogdanWong William

Page 2: FLAVIUS Meeting – WP4

Agenda• LW contributions• Keys to successful integration• Complete integration picture• Translation REST API• Trustscore™ and Reporting• REST API Version 2• Customization through dictionaries• Customization through training• FLAVIUS Language Weaver Roadmap• Questions & Answers

Page 3: FLAVIUS Meeting – WP4

Language Weaver’s Contribution

Page 4: FLAVIUS Meeting – WP4

Keys to a Successful Partner Integration1. Ability to integrate with Language Weaver

Machine Translation for development and testing

2. Ability to customize baseline engines with dictionaries

3. Ability to customize baseline engines with training of domain/customer specific vertical system

Page 5: FLAVIUS Meeting – WP4

Complete Picture

REST API

TOD

Reporting Trustscore™

Dictionary

Training

Page 6: FLAVIUS Meeting – WP4

Sample UI for the Translation Engine

Page 7: FLAVIUS Meeting – WP4

Translation REST API• Simple HTTP base communication protocol• Leverage HTTP calls – POST, GET, DELETE• Web 2.0 used by Amazon, Twitter, etc.• Supported text formats: TXT, HTML, TMX, XLIFF• Data is encrypted using SSL (via HTTPS)• Authentication using a custom HTTP scheme

– Two addition headers added to every request• LW_Date – Contains a date/time string based on the

request time• Authorization – Contains a string made up of three

strings (each separated by a colon): “LWA:<userid>:<signature>”

• Unique signature generated using a keyed-HMAC (Hash Message Authentication Code) and a SHA1(Secure Hash Algorithm) digest

Page 8: FLAVIUS Meeting – WP4

Translation Rest API

/v1/lpinfo+

HTTP GET

Language Pair

Non-Blocking Translations

/v1/translation/src.tgt/lpid=<id>+

HTTP POST

/v1/user+

HTTP POSTUser

Blocking Translations

/v1/translation/src.tgt/lpid=<id>+

HTTP POST

/v1/translation/src.tgt/lpid=<id>/<jobid>

+HTTP GET/DELETE

Page 9: FLAVIUS Meeting – WP4

Translation REST API• Blocking Translation Request

– HTTP POST to https://lwaccess.languageweaver.com/v1/translation/[src].[tgt]/lpid=[lpid]/[optional-params]/

• Appropriate small chunks of data (less than 640 bytes)• Mandatory Input Parameters:

– [src] – three letter code for the source language (e.g. “eng” for English)– [tgt] – three letter code for the target language– [lpid] – integer denoting the specific language pair system to be used– “source_text=” – [string] - URL escaped version of the input source (POST

DATA)• Optional Input Parameters:

– input_format=[value] – string declaring the input format. Choose from “html”, “plain”, “xliff”.

– input_encoding=[value] – string defining the input format. Only “utf8” supported

• Sample Calls:– Create Blocking Translation Job for Text, Get Language Pair details

Page 10: FLAVIUS Meeting – WP4

Translation REST API• Non-Blocking Translation Request

– HTTP POST to https://lwaccess.languageweaver.com/v1/translation-async/[src].[tgt]/lpid=[lpid]/[optional-params]/

• Appropriate for large size files• Mandatory /Optional Input Parameters are similar with the Blocking

Translation • Sample calls:

– Create Non-Blocking Translation Job for Text/ URL/ File– Get Language Pair details, Get User Info

– Followed by HTTP GET’s to https://api.languageweaver.com/v1/translation-async/[src].[tgt]/[jobID]/lpid=[lpid]/[optional-params]/

• [jobID] – integer denoting the specific translation submitted with the POST

• Sample calls:– GET Non-Blocking Translation Job for Text/ URL/ File

Page 11: FLAVIUS Meeting – WP4

Translation REST API• Sample code – C# Example

// Step 1: Construct the path. Check to see if the LPID and/or input_format is submitted

string szPath = "/v1/translation/" + szSrcLang + "." + szTgtLang + "/"; if (0 != szLPID.Length)

szPath = szPath + "lpid=" + szLPID + "/"; if (0 != szInputFormat.Length)

szPath = szPath + "input_format=" + szInputFormat + "/"; // Step 2: Construct the URL string szURI = m_szHostName + szPath; System.Console.WriteLine(szURI); // Step 3: Prepare the POST request HttpWebRequest request =

(HttpWebRequest)WebRequest.Create(szURI); PrepareHttpRequestHeader("POST", szPath, ref request);

Page 12: FLAVIUS Meeting – WP4

Translation REST API// Step 4: Attach the POST data szSourceText = "source_text=" + szSourceText; byte[] postDataBytes = Encoding.UTF8.GetBytes(szSourceText); request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = postDataBytes.Length; Stream requestStream = request.GetRequestStream(); requestStream.Write(postDataBytes, 0, postDataBytes.Length); requestStream.Close(); // Step 5: Read the response HttpWebResponse response = (HttpWebResponse)request.GetResponse(); StreamReader responseReader = new

StreamReader(response.GetResponseStream(), Encoding.UTF8); string lpInfoResponse = responseReader.ReadToEnd(); // Step 6: Parse the XML document for the translated text XmlDocument xmlDoc = new XmlDocument(); xmlDoc.LoadXml(lpInfoResponse); System.Console.WriteLine(lpInfoResponse); XmlNodeList nodeList = xmlDoc.GetElementsByTagName("translated_text"); szTargetText = nodeList[0].InnerText.Trim();

Page 13: FLAVIUS Meeting – WP4

Translation REST API – Header Generation• Sample code – C# Example

– Generate Header // Step 1: Get the current HTTP date

string szHttpDate = GetHttpDate(); // Step 2: Generate the signature szRequestType = szRequestType.ToUpper(); string szSignature = GenerateSignature(szRequestType,

szHttpDate, szURI); // Step 3: Add the two new headers to the request object request.Headers.Add("LW_Date", szHttpDate); request.Headers.Add("Authorization", "LWA:" + m_szUserID +

":" + szSignature); System.Console.WriteLine(szSignature);

Page 14: FLAVIUS Meeting – WP4

Translation REST API – Header Generation

– Generate SignatureEncoding u8Encoding = new UTF8Encoding();

HMACSHA1 hmacsha1 = new HMACSHA1(u8Encoding.GetBytes(m_szAPIKey));

string szMessage = szRequestType.Trim() + "\n" + szHttpDate.Trim() + "\n" + szURI.Trim();

string szSignature = Convert.ToBase64String(hmacsha1.ComputeHash(u8Encoding.GetBytes(szMessage.ToCharArray())));

return szSignature;

Page 15: FLAVIUS Meeting – WP4

Translation REST API• Sample request – response for Create Non-Blocking Translation

Job for Texte.g. HTTP POST request to https://lwaccess.languageweaver.com/v1/translation-async/eng.fra/lpid=74/<?xml version='1.0' encoding='UTF-8'?><lwresponse> <service_version>v1</service_version> <requested_url>/v1/translation-async/eng.fra/lpid=74/</requested_url> <request_type>POST</request_type> <request_time>Wed Mar 3 14:55:51 2010</request_time> <source_language>eng</source_language> <target_language>fra</target_language> <response_data type='translation-async_post'><retrieval_url>https://lwaccess.languageweaver.com/v1/translation-async/eng.fra/

90079.3bccc5e58d50ce7dcaf950f562ec2303/lpid=74</retrieval_url><job_id>90079</job_id><translation_signature>3bccc5e58d50ce7dcaf950f562ec2303</translation_signature><src>eng</src><tgt>fra</tgt><lpid>74</lpid><input_format>text/plain</input_format><input_encoding></input_encoding><dictionary></dictionary><customizer></customizer><source_text><![CDATA[Hello World]]></source_text><server><version>5.1.2 release ENGFRAU20_5.1.x.0</version></server></response_data></lwresponse>

Page 16: FLAVIUS Meeting – WP4

Trustscore™ and Reporting• Internal LW milestone

– Migration to version 2 of REST API• Reporting:

– Words per minute– Number of documents translated– Average document length – Details about the TrustScore™– Other metrics to be defined

• Trustscore™:– Scored from 1-5– Document level scoring– Segment level scoring not supported

Page 17: FLAVIUS Meeting – WP4

REST API Version 2• New format

– Sample of Create Non-Blocking Translation Job for Text

• https://api.languageweaver.com/v2/language-pair/[lpid]/translation-async/[optional-params]/

• Mandatory and Optional parameters same as v1

• Additional calls/ functionality related to:– Trustscure – Reporting – Dictionary

Page 18: FLAVIUS Meeting – WP4

Customization through Dictionaries• Structure

– One entry per term, one translation per entry– Search & Replace mechanism that applies unconditionally

• Size – Up to 300.000 entries

• Best practice to build one– Using CSV files

• Limitations– No limitations on the content– Recommend use of dictionaries is via phrase replacement

instead of word replacement– Gender is not automatically generated– UTF-8

• Impact on performance– No significant impact

Page 19: FLAVIUS Meeting – WP4

Customization through Training

d

Parallel Aligned Text

Optional: Regression Text

Optional: Test Text

Evaluation

Data:• Fix noisy text• More text• Text alignment• Text segmentation

Product Delivery viaTOD

LW TrainingCompute Cloud

Page 20: FLAVIUS Meeting – WP4

Customization through Training• Structure:

– Train on any language pair specified in the FLAVIUS agreement

– Inputs: TMX parallel segments, optional regression text files, optional test sets for evaluation

– Outputs:• Trained engine• Results of BLEU scored test set• Translated output of regression text files• Metrics from input training corpus

– Evaluate customized engine via TOD deployment

Page 21: FLAVIUS Meeting – WP4

FLAVIUS Language Weaver Roadmap

Page 22: FLAVIUS Meeting – WP4

Questions & Answers

Page 23: FLAVIUS Meeting – WP4

Thank you!Accelerating the way the world communicates