in progress review lamp lab chin-mt project university of maryland february 18, 1999

16
In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Upload: moses-griffith

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

In Progress Review

LAMP Lab Chin-MT project

University of Maryland

February 18, 1999

Page 2: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

a. Demonstration and overview:9:30-9:45 Introduction to project, B. Onyshkevych9:45-9:55 Rationale and Overview of Progress

in Development of System Components, A. Weinberg

9:55-10:00 Overview of Demonstration, P. Resnik and W. Shen

10:00-10:30 Demonstration and Questions, P. Resnik and W. Shen

I: NSA - FEBRUARY IPR FOR LAMP LAB NATURAL LANGUAGE - MT EFFORT

Page 3: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

I: NSA - FEBRUARY IPR FOR LAMP LAB NATURAL LANGUAGE - MT EFFORT

Cont’d

b. Technical Presentation/Future Directions10:45- Laboratory Management Issues20 Min. Parsing - Construction Covered to Date

New Directions, A. Weinberg, P. Resnik20 Min. Lexicon - Scalability of Current Components,

Creation of Grids, Automatic Acquisition, Mining, B. Dorr

20 Min. Generation - Discussion of Current Algorithm Future Directions, D. Traum

Page 4: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Faculty: Dr. Bonnie Dorr, CS, UMIACSDr. Philip Resnik, Linguistics, UMIACSDr. Amy Weinberg, Linguistics, UMIACS

Postdoctoral Dr. Gina Levow, UMIACS* Researchers: Dr. Mari Olsen, UMIACS

Dr. David Traum, UMIACS

Graduate Joseph Garman, Linguistics Scott Thomas, CSStudents: Nazer Habash, CS* Jin Tong, CS

Wade Shen, CS

THE LAMP LABORATORY - MT PROJECT

Page 5: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

NSA Visiting Ron Dolan, Library of CongressScholars: John Kovarik, DoD

MaryEllen Okurowski, DoD

Visiting Scholars: Dekang Lin, 01/99-08/99

THE LAMP LABORATORY - MT PROJECT

Cond’t

Page 6: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Automatically created high quality, broad coverage machine translation.

Example of Word to Word: <Ask David/Phil to provide>

1. Example where generation output:- perfect- slightly degraded- generation degraded by CLCS- gloss ok

OUR GOAL

Page 7: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Example of CLCS output:

Example of generated string:

OUR GOAL

Page 8: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Work on Chin - MT began -- Oct. 1997

1st Phase:

Development of Small Scale End to End System on representative (159 sentence) corpus of Chinese newspaper (Tsin hua) articles.

WORK ON CHIN - MT

Page 9: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Development of Broad Scale Static Resources:

Lexicon: Optilex 250 entries augmented with appropriate argument structure (thematic role) grids and Lexical conceptual structures.

<Bonnie: current coverage of English lexicons - Chinese lexicons>

WORK ON CHIN - MT

Cont’d

Page 10: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Parser: small scale; 217 grammar rulesMultipath REAP

Generation: Add <David Traum>

WORK ON CHIN - MT

Cont’d

Page 11: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Integration with Currently Existing or Simultaneously Built Resources from Other Institutions

- NMSU/Mikrokosmos interface- ISI/Nitrogen

WORK ON CHIN - MT

Cont’d

Page 12: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

SYSTEM COMPONENTS AND COVERAGE

Output: English translated string

Shared ONI & ISI(Nitrogen)

Output: Composed LCS (CLSCS) transformed to AMR (<David - Abstract meaning represention>)

Output: argument structure augmented syntactic string

Output 1 parsed corpus with appropriate argument structure features for Lexical - conceptual structure (LCS) composition

Output: segmented string with complex names identifiedas single smts.

Input: unsegmented Chinese string

Syntaotic recoding and Realization: translate kcs based features to

Nitrogen features:Feb: algorithm implemented

English lexical selectionFeb: algorithm implemented

<David - coverage>

Lexical Conceptual Structure(LCS) composition

June: inefficiency composed LCS for--------sentences

Feb: -------handled by LCS compositionParser

June: 404 fragments - 352 legal parse269 correct parse

Feb: 100 out of 150 full sentences with correct parse

Sementor/nometaggerJune: hand segmentation

hand tagging 150 sentences

F(unctional) structure transducer -input to NMSU semantics

(90 f-structures to NMSO for evaluation - Dec. 1998)

NMSU Semanticontolgoies

Page 13: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Slide 4:Intermediate Milestones/Next Steps:• Full end to end integration with NMSO:

a. f-structure to TMR integration.b. f-structure to AMR-based generation

• Evaluation of LCS as fail soft mechanism. Comparison of translations produced by LCS/Nitrogen.• Improvement of Coverage/Move towards Broad Scale Coverage of all components:• Parsing: - design/experimentation with - extension to Minper (in cooperation with Dekong (in Vol Monitoba)• Lexicon

- Broad coverage for adjectives and nouns, the latter of which will be automatically subdivided into simple and event-based nominals. Corresponding English refinements. Finish Broad coverage for prepositions.

- Finish English verb grid refinement and Chinese grid generation and checking. Speed up by dividing remaining verbs into Levin classes.

- Port verb grids and refine composition algorithm for event-based nominals, include features from WordNet and will be assigned atomic LCSs. Event-based noun entries will be automatically associated with LCS’s from their verbal counterparts (abduction derived from abduct) for event based nounts in Optilex.

- Broad coverage and representation refinement for functional elements (numbers, numerals, classifiers). These LCSed by hand in the current iteration.

- Port verb-based LCS entries into the noun lexicon for English and Chinese. Discourse

- Sept 1999: - Additional testing and improvement of LCS path. Debugging and testing more as the clcses become

available.- Additional of NMSU path -. Then converting nmsu f-structures to English. The plan for that is to

convert either to nitrogen lattices, or perhaps amr’s, depending on what these f-structures actually look like.

Page 14: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Laboratory management Problems:

1. version control too many copies of software- code runs on one copy not the other.

need to roll back to previous version of some piece of software but its not around unless someone has saved it.

Solution: Installation of Concurrent Version System(CVS) check -in/check-out software static resources and running programs checked in. They become the “official version”. Automatic consistency checking at “check-in time”. If differences from previous version, need permission form previous check in to check in new version or merge.

Page 15: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

State of implementation: Chinese/English lexicons under CVS - next LCS programs: convert to shorthand/longhand - then, parser, f-structure, generation programs

Complete by June

Page 16: In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

Problem 2: Operating and file system problems:

program works on machine A, not machine B.

All machines switched to Solaris 2.6 and installation of AFS( new networked file system manager)

AFS provides better management for large programs: shared file speedup, local caching, local control of protections, permissions.

Improved environment will allow us to discourage work from home.Lower bandwidth for improved communication between members of the team.