center for content extraction · top secret//comint/irel to usa, a us, can, gbr, nzl//20320108...
TRANSCRIPT
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL
Human Language Technology _
., ... '\... -IV . ..
Center for Content Extraction
Content Extraction Analytics SIGDEV End-to-End Demo
21 May 2009
Derived From: NSA/CSSM 1-52 Dated: 20070 1 08
Declassify On: 203301 08
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Introduction to Content Extraction
• New technologies can find Essential Elements of Information in documents
The Center for Content Extraction provides "one stop shopping" for these technologies at NSA
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Extraction can benefit SIGDEV from end to end
Selection lira1nslation & Transliteration Analysis
II I 1nter1pretation/Enrichment Retrieval Storage & Distribution
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
STAIRS Partners
5 (Marina, CEA)
T (Cybertrans)
A (SNA/Paintball, Synapse)
I (Nymrod,Thundercloud)
R (Journeyman/CPE)
5 (GoldenRetriever, SocioPath)
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Imple1mentation: CCE Extraction Architecture (Lex Hound)
Subscription Based Customers - extracted
report/transcript content
Marina (comms tracking) Synapse/EKS (link analysis) Nymrod (Name Matching)
Web Service On
Demand Customers
L WebServices)JJ
LexHound Web Demo CAMT (translation) TKB (target knowledge base) SNA (social network analysis) GIS ( geo mapping) NTOC (terror cell tracking) Heresyitch (UC collateral) GoldenRetriever (record building)
I
------------------------------, Reports _.
Transcri~
1
Ingester
Dispatcher
Task Manager
/
\ \ '\ '" \ \ \ ' \ \ ' ' . ~......_ __ _
...,..__..{ \ ~tractor(s) II •• ' • . ' ' '
·: trc)~former I \
' • • ' _I • I I R~derer
' I
'
+- -------1 I Sender I I
l ~--------- Output ---- ..
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Elaboration: The Central Importance of Storage
D Each of the STAIRS Steps exploits stored i1nformation • Selection Dictionaries ("get it")
• Linguistic Glossaries for Translation
• Wikis etc for enrichment ("know it")
D Ma1nual record-formation is slow, prone to 01missio1ns and inconsistencies • <200K Person Ta rgets in TKB
• Growth rv = 20K/year
D Auto1matic extraction accelerates storage • >3000K Citation Records in Nymrod Entity DB
• Growth rv = lOOOK/year
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Machine vs. Manual Chief-of-State Citations
Nymrod (machine-extracted) Citations LastTKB
Cod Manual
Name Role Cites Update A
Malaysian Prime 10/15/200 1 Abdullah Badawi Minister cos > 100 7
2 Abdullahi Yusuf Somali President cos > 300 N/A
(Mah mud 'Abbas) PA 3 Abu Mazin President cos >200 5/20/2009
4 Alan Garcia Peruvian President cos > 100 N/A
5 Aleksandr Lukashenko Belarusian President cos >50 N/A
6 Alvaro Golom Guatemalan President cos >200 N/A
7 Alvaro Uribe Colombian President cos >700 N/A
8 Amadou Toumani Toure Malian President cos >50 N/A
9 Angela Merkel German Chancellor cos > 300 N/A
10 Bashar ai-Asad Syrian President cos > 800 N/A
... ........................... ... .... ... .. ....... .... .. .. . ..
122 Yuliya Tymoshenko Ukrainian Prime cos >200 N/A
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
"\:1 ~
~
£:::1
~ p
</' V7 \:7" C;::..
Hwnan Language Technology "•' ~ f •