text analysis using automated language translators

11
Text Analysis Using Automated Language Translators CDT John Stanford MAJ Ian McCulloh

Upload: whitfield-heath

Post on 02-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Text Analysis Using Automated Language Translators. CDT John Stanford MAJ Ian McCulloh. Agenda. Overview and Hypothesis Literature Review Motivation (Radio Address Case Study) Arabic Translation Data Conclusions and Recommendations. Overview and Hypothesis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Text Analysis Using Automated Language Translators

Text Analysis Using Automated Language Translators

CDT John Stanford

MAJ Ian McCulloh

Page 2: Text Analysis Using Automated Language Translators

Agenda

• Overview and Hypothesis• Literature Review• Motivation (Radio Address Case Study)• Arabic Translation Data• Conclusions and Recommendations

Page 3: Text Analysis Using Automated Language Translators

Overview and Hypothesis

• Text analysis is a useful tool for gathering intelligence.• A language barrier exists that makes text analysis harder in non-

English-speaking regions.• Hiring human translators to translate texts into English is slow,

expensive, and possibly a security issue.• Hypothesis: Output from automated machine translators such as the

Forward Area Lanuage Converter (FALCon) is difficult for the average person to understand, but is just as useful for text analysis as human-translated text.

Page 4: Text Analysis Using Automated Language Translators

Literature Review

• This project relates to two ARL projects: FALCon and the ARL Dynamic Network Analysis Lab.

• Language can be modeled mathematically as a network of concepts using an adjacency matrix (Sowa, 1984).

• Preprocessing steps such as stemming, deletion, and thesaurus application prepare a text for analysis (Carley and Diesner, 2004).

• AutoMap, being developed by Carnegie Mellon University, inputs texts and outputs adjacency matrices.

• ORA, also being developed by CMU, inputs the adjacency matrices and outputs the mental models (Carley and Reminga, 2004).

Page 5: Text Analysis Using Automated Language Translators

Text Analysis Process

Page 6: Text Analysis Using Automated Language Translators

Radio Address Study

• 94 of the President’s weekly radio addresses analyzed• From after Sep 11th to after the beginning of OIF (15 Sep 2001 to 21

June 2003)• Concept of ‘violence’ plotted on timeline; high occurrence after Sep

11th and leading up to OIF

27 JUL

2002

15 SEP

2001

21 JUN

2003

12 SEP 2002- George B

ush speaks to UN

General A

ssembly

20 MA

R 2003- U

nited States invades Iraq

Page 7: Text Analysis Using Automated Language Translators

Arabic Text Analysis

• Arabic translated using CyberTrans, part of the FALCon package.• 22 Arabic articles from the Department of State’s news site

analyzed (US Dept of State, 2006).

Page 8: Text Analysis Using Automated Language Translators

Analysis Results

• Top concepts for the two methods of translation are the same in 16 of the 22 articles.

• Top concept in the human-translated text is in the top three machine-translated concepts for all articles

• When the methods differ, the human translation isn’t necessarily better.

Human Machine

Page 9: Text Analysis Using Automated Language Translators

Conclusions and Recommendations

• Automated text analysis makes it fast and economical to look at trends in local publications of strategically significant regions over either time or space.

• Detailed statistical analysis must be done on this data.• Intelligence agencies who have access to large volumes of

REDFOR data should run this kind of text analysis to verify that it works as well on REDFOR data as BLUFOR data.

• FALCon development should continue and possibly be expanded to other languages such as Farsi.

Page 10: Text Analysis Using Automated Language Translators

Works CitedBush, George. (2001-03). “President Bush’s Radio Addresses by date and

topic.” Washington, DC: Office of the Press Secretary. Available from < http://www.whitehouse.gov/news/radio/index.html>.

Carley, Kathleen and Diesner, Jana. (2004). Revealing Social Structure from Texts: Meta-Matrix Text Analysis as a novel method for Network Text Analysis. Causal Mapping for Information Systems and Technology Research: Approaches, Advances, and Illustrations., Harrisburg, PA: Idea Group Publishing.

Sowa, J.F. (1984). Conceptual Structures: Information Processing in Mind and Machine. Reading, MA: Addison-Wesley.

US Dept of State. (2006). “News from Washington.” Washington, DC: Office of the Press Secretary. Available from < http://usinfo.state.gov/usinfo/products/washfile.html>.

Page 11: Text Analysis Using Automated Language Translators

Questions?

Dept of Mathematical SciencesUnites States Military Academy

Dynamic Network Analysis LabArmy Research Lab