1 working with natural language text: tools and techniques nestor rychtyckyj advanced &...

40
1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

Post on 21-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

1

Working with Natural Language Text: Tools and

Techniques

Nestor Rychtyckyj

Advanced & Manufacturing Engineering Systems

Ford Motor Company

Page 2: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

2

Agenda

• Introduction• Description of problem– Why is language

so important?• Dealing with Natural Language Text• Application Examples• Machine Translation• Future Directions• Conclusions

Page 3: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

3

Natural Language Text is “everywhere”

• Internet • Web sites• Blogs• Customer Feedback• Dealer Feedback• Lessons Learned• Corporate Knowledge• Warranty Claims• Internal documentation• Spoken Dialog systems

Page 4: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

4

Dealing With Text Information

• Search Engines (Google, askjeeves.com) • Excel• Commercial Text Mining Tools (Wordstat, SAS

Text Miner, SMART Text Miner, etc)• Open Source tools (Wordnet, Senseclusters,

etc.)• Controlled Languages• Ontologies• Natural Language Processing• Semantic Web

Page 5: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

5

Present Status

• Mostly key-word based• Very little intelligence, no background knowledge

or context• Limited natural language dialog interpretation• Most of the processing is left to the human user• Difficult to build computer systems that can

retrieve information in an “intelligent” manner

Page 6: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

6

Future State

• Semantic Web – information on the web is organized using structured tagging based on XML, RDF, OWL, SWRL

• machine-processable data on the web• standard interface to data• rich knowledge representations through

ontologies• Allows for the development of systems that cab

retrieve information in an intelligent manner

Page 7: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

7

Semantic Web Architecture

Source: Tim Berners-Lee, 2000

Page 8: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

8

Artificial Intelligence (AI)

• Study on how to build human-level intelligence into computer applications

• Uses learning, representation of human knowledge, understanding of language, vision, speech, etc.

• Applies the built-in knowledge using inference and reasoning

• Been very successful in limited problem domains – less so for general applications

• Integrated into many applications areas including manufacturing, planning, search, speech recognition, financial analysis, games, customer analysis, commercial fishing, etc.

Page 9: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

9

Current use of AI in Manufacturing at Ford

• AI applications for manufacturing• Bring appropriate knowledge about manufacturing

to the proper people at the right time• Improve manufacturing efficiency• Reduce workplace injuries through better up-front

ergonomics analysis• Make assembly build instructions available to

operators in other languages• Develop common framework for representing

knowledge and exchanging it between different systems

Page 10: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

10

Knowledge Sources in Manufacturing

• Process Build Information• Required Tooling• Part Information• Ergonomics Analysis• Plant Layout Information• Assembly Visualization• Safety Concerns• Manufacturing “Best Practices”

Page 11: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

11

Global Study Process Allocation System (GSPAS)

• The Allocation system used to assign manufacturing processes to plant operation resources.

• Process sheets use STANDARD LANGUAGE (159) verbs

• Like - insert, select, grasp, load …

Page 12: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

12

Global Study Process Allocation System (GSPAS)

• Global System to handle Manufacturing Costing, Process and Labor Management for vehicle assembly.

• Standard Language and AI is an integral part of GSPAS.

• Launched in North America and Europe in 1998 to support the Focus program.

• Currently deployed for almost all car and truck manufacturing at Vehicle Operations assembly plants world-wide.

Page 13: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

13

Step by Step Instructions

Process sheets specify the operations, tasks, parts and

tools required to support the production of a vehicle.

Page 14: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

14

Standard Language

• Controlled language where the grammar and syntax is restricted.

• Developed at Ford Body & Assembly to describe the vehicle assembly process.

• Contains information about tools, parts and work required to build a vehicle.

• Contains over 5000 words, 1000 abbreviations that can be used by the process engineers.

• Standard Language is checked by Artificial Intelligence (AI) system.

Page 15: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

15

Examples of Standard Language

1. ALIGN-AND-SEAT DOOR TRIM PROTECTOR 2. FIRMLY PRESS SEALER INTO JOINT TO

AFFECT A POSITIVE SEAL3. APPLY DAUB OF SEALER TO THE JOINT OF

THE CENTER FLOOR PAN AND FRONT FLOOR PAN AT ROCKER PANEL

4. PUSH SEAT REARWARD TO EXPOSE FRONT ATTACHMENTS

Page 16: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

16

Standard Language Rules

• Imperative form• Sentence must start with verb clause followed by

noun phrase.• Only one Standard Language (main action) verb

per sentence.• Some prepositions have special meaning

(“using”, “with”).• Size modifiers may follow nouns (“bumper

large”).• Free form allowed for certain verbs “verify that..”)

Page 17: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

17

Process Sheet Written in Standard Language from CAP (Focus) deckTITLE: ASSEMBLE IMMERSION HEATER TO ENGINE10 OBTAIN ENGINE BLOCK HEATER ASSEMBLY FROM STOCK20 LOOSEN HEATER ASSEMBLY TURNSCREW USING POWER TOOL30 APPLY GREASE TO RUBBER O-RING AND CORE OPENING40 INSERT HEATER ASSEMBLY INTO RIGHT REAR CORE PLUG HOSE50 ALIGN SCREW HEAD TO TOP OF HEATERTOOL 20 1 P AAPTCA TSEQ RT ANGLE NUTRUNNERTOOL 30 1 C COMM TSEQ GREASE BRUSH  Resulting Work Instructions Generated by DLMS For Line 20LOOSEN HEATER ASSEMBLY TURNSCREW USING POWER TOOL005 GRASP POWER TOOL (RT ANGLE NUTRUNNER) <01M4G1>010 POSITION POWER TOOL (RT ANGLE NUTRUNNER) <01M4P2>015 ACTIVATE POWER TOOL (RT ANGLE NUTRUNNER) <01M1P0>020 REMOVE POWER TOOL (RT ANGLE NUTRUNNER) <01M4P0>025 RELEASE POWER TOOL (RT ANGLE NUTRUNNER) <01M4P0> .

Standard Language Process Sheet

Page 18: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

18

Natural Language Parsing

Secure bracket using multiple motor nutrunner

Verb Phrase

Verb

Noun Phrase

Noun

PrepositionalPhrase

Preposition Noun Phrase

Secure Bracket Using Noun

Page 19: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

19

Process for Natural Language Processing

• Parse the text (sentence by sentence) into parse tree structure

• Bypass/ignore common words (articles, common terms)

• Stemming (get the root of the word)• Word lookup (synonyms, misspellings,

acronyms)• Word understanding (deeper-level ontologies)• Controlled languages with automated checking

Page 20: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

20

Parsing Information in Standard Language

• Example of Standard Language parsing: “Feed 2 150 mm wire assemblies through hole in liftgate panel”

• (S (VP (VERB FEED)) (NP (SIMPLE-NP (QUANTIFIER 2) (DIM (QUANTIFIER 150) (DIM-UNIT-1 MM)) (ADJECTIVE WIRE) (NOUN ASSEMBLY))) (S-PP (S-PREP THROUGH) (NP (SIMPLE-NP (NOUN HOLE) (N-PP (N-PREP in) (NP (SIMPLE-NP (ADJECTIVE LIFTGATE) (ADJECTIVE OUTER) (NOUN PANEL))))))))

Page 21: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

21

Ontology – used to represent knowledge

• Individuals

• Classes (with hierarchy); think sets

• Properties (w/ hierarchy); not part of class

• Equivalence

• Property characteristics/restrictions

• Complex classes

Page 22: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

22

GSPAS Ontology

Thing

Tools Parts Lexical Nodes Operations

HAMMER

Attributes: Size, Part of Speech, Subsystem-id, etc….

Intervening Concept Nodes

Page 23: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

23

GSPAS Knowledge Base

Page 24: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

24

Ergonomics Analysis

• Check the assembly work instructions to determine what type of physical action is being described

• Check the assembly work instruction to determine what object is manipulated

• Check the associated parts and tools for part weight and tool properties

• Flag potential ergonomics concerns at the process level and at the work allocation level

• Knowledge can be represented as a business rule

Page 25: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

25

Machine Translation

• “The Spirit is willing but the flesh is weak”• "The vodka is tempting, but the meat's a bit

suspect". • “The alcohol is arranged, but the meat is weak.” • “This kind of spirit is wants, but the flesh and

blood is weak.”• “The spirit is willing, but the flesh is impossible”• “The spirit puts out the flag and does, the flesh

omits but.”

Page 26: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

26

Machine Translation

• Use of computers to translate from one language to another

• Examples: Babelfish• Translation accuracy is highly dependant on the

quality of the source text• Use proper grammar, punctuation, shorter

sentences, active voice to improve quality• Customize translation systems for each

application domain

Page 27: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

27

Problem Description

• Need to translate assembly build instructions from English to the language used at the assembly plants

• A single vehicle may require several thousand process sheets to describe the assembly process

• Large amount of assembly instructions are frequently modified

• Large volume of translations precludes the use of human translators

• Specialized terminology requires technical glossaries• MT performance can be improved greatly by improving

the source text

Page 28: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

28

Application Description

• Machine Translation is integrated into the process planning for manufacturing system known as GSPAS (Global Study Process Allocation System)

• The translation process is fully automated and does not require human intervention

• Translation occurs automatically after a process sheet is validated by the AI system and before it is released to the assembly plants.

• We currently translate build instructions for 26 different vehicle lines in 5 languages (we also have a separate glossary for Mexican Spanish)

• Data is read in from an Oracle database, processed through the translation system and the output is then written out to the Oracle database

Page 29: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

29

Machine Translation

• Source: Process build instructions in English• Target: Process build instructions in Spanish, German,

Portuguese, Dutch & Turkish• Translate both controlled language and embedded free-

form text• Example: SECURE BUMPER BRACKET {FOR LHS

ONLY} TO VEHICLE BODY USING POWER TOOL• Utilize customized SYSTRAN translation engine,

automotive and Ford-specific terminology glossaries and embedded tagging

• Future plans include additional parsing and tagging information to improve translation accuracy

Page 30: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

30

Machine Translation Implementation in GSPAS

• Worked with Systran & Apptek to customize their translation software for our requirements.

• Develop technical dictionaries that contain Ford terminology with correct translation for each language pair.

• Develop and integrate the translation process into GSPAS.

• Developed a system to check and improve the source text prior to translation

Page 31: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

31

Translation Statistics

• Language pairs being translated: English/German, English/Spanish, English/Dutch, English/Portuguese, English-Spanish (Mexican), English-Turkish

• Ford specific terminology in Standard Language: over 5000 words, 13,000 noun phrases, over 1000 abbreviations and acronyms .

• Typically translate over 200,000 records each month

• Over 10,000,000 records already translated.

Page 32: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

32

GSPAS Translation Process

Page 33: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

33

Standard Language Translation Issues

• Sentence structure is not grammatical English (ROBOT APPLY 50 MM TAPE-STRIPE)

• Ford terminology is complex and must be explicitly translated as an entire phrase (INSULATION ASSEMBLY BODY PILLAR)

• Use of abbreviations, misspellings, acronyms (ABS, A.B.S)

• Use of compound verbs (PICK-AND-SPOON)• Inverted phrase structure with modifiers (BODY PANEL

LRG)• Embedded comments (LOAD BUMPER {LOWER} TO

VEHICLE)

Page 34: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

34

Standard Language Translation

• Use of slang (“shotgun”)• Articles are seldom used (HAMMER HAMMER).• Need to handle “British” English as well as

“American” English. (terminology, use, spellings)• Source text is incorrectly written and not

understandable.• Punctuation is rarely used.• Standard Language is always evolving and

needs to be maintained.

Page 35: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

35

Uses of AI Technology

• Apply natural language processing (NLP) along with knowledge representation and reasoning to improve the source text

• Analyze the source text; utilize the ontology to identify terminology

• Convert the source text to a more “translatable” form by adding articles, replacing abbreviations, improving grammar and punctuation

• Utilize XML tagging and ontology lookup to improve the structure of free-form source text

Page 36: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

36

Improving Translation Quality

• Process the source text prior to translation (Standard Language pre-processor).

• Add articles before the nouns.• Adjust the word order to deal with size modifiers

coming after nouns.• Replace acronyms, synonyms with original

expanded text (ASY -> ASSEMBLY)• Verify that punctuation is correct.• Pre-process the embedded comments to

improve translation quality.

Page 37: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

37

Issues with Machine Translation Quality

• Localization issues (even with technical terminology) – Spanish in Spain, Mexico, Argentina, etc.

• Ensure that system correctly displays special characters (umlaut, accents etc.)

• Have additional space available on screen as target languages require more room than English.

Page 38: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

38

Conclusions

• Machine Translation is a cost-effective way to translate information with high quality if you are willing to customize the application to your requirements

• Machine Translation is not an “out of the box” solution

• Machine Translation accuracy can be greatly improved by controlling and improving the quality of the source text

Page 39: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

39

Where are we going?

• Intelligent search w/ context and understanding• Sharing of knowledge through ontologies• Growth of user-defined knowledge

(folksonomies)• Intelligent Dialog Systems – integration of

speech recognition w/ intelligent engines (“Sync”)

• Automate the process of information retrieval

Page 40: 1 Working with Natural Language Text: Tools and Techniques Nestor Rychtyckyj Advanced & Manufacturing Engineering Systems Ford Motor Company

40

Questions

?????