unfriending the enemy; social media analytics and tradecraft · unfriending the enemy; social media...
TRANSCRIPT
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Unfriending the Enemy; Social Media Analytics and Tradecraft
Tom Sabo; Principal Solutions Architect, SAS
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Who was born after 1980?
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
The baseline WHAT IS OPEN SOURCE INTELLIGENCE (OSINT)?
Open-source intelligence
(OSINT) is intelligence collected from publicly
available sources. In the intelligence community
(IC), the term "open" refers to overt, publicly
available sources (as opposed to covert or
clandestine sources); it is not related to open-source
software or public intelligence.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Collection and its ChallengesTradecraft: Analysis & DisseminationDecision to Act
Agenda
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Collection …AND ITS CHALLENGES!
Filtering Out Noise Volume Legality
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Overcoming the Challenge
VOLUME OF DATA – THE ZETTABYTE ERA
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Filtering & Noise
GETTING STARTED & MANAGING THE VOLUME
The Whole Internet
Relevant Data for Mission Problem
Cast a big net or target something small
• Large number of queries = high noise level
• Small number of target specific queries = missed opportunity
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Overcoming the challenge
FILTERING OUT THE NOISE FOR KEY PIECES OF INTELLIGENCE
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Overcoming the challenge
INTELLIGENCE OVERSIGHT & YOUR CITIZENSUnited States – Section 702 FISA
The Government cannot target anyone under the court-approved procedures for Section 702 collection unless there is an appropriate, and documented, foreign intelligence purpose for the acquisition and the foreign target is reasonably believed to be outside the United States. Section 702 cannot be used to intentionally target any U.S. citizen, or any other U.S. person, or to intentionally target any person known to be in the United States. Section 702 cannot be used to target a person outside the United States if the purpose is to acquire information from a person inside the United States.
Limited Targeting Under the Law
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Tradecraft: Analysis and Dissemination
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Methodology Dual approach to OSINT Data
NEED METHODOLOGY APPROACH
• Monitor a topic within social media that is hard to define
• Mission related topics need to be researched again and again
• You know what you are looking for
• Use text analytics to develop high quality curated corpuses based on many, many keywords
• Returned data goes through advanced filter rules to eliminate irrelevant data
• Relevant information is extracted and organized
Curated Corpus
• Topics of immediate interest that can’t wait for the full curation assets to be built – need to circumvent collection
• Missions are not ongoing (sudden incidences)
• Run one-time ad hoc queries against Twitter data for example and the data is returned live to be filtered, analyzed and extracted. Ad Hoc Search
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Methodology Document Corpus Building vs. Ad Hoc Querying
Consider both with Analyst determining approach based on Need
Syria
Abu Bakr al-
Baghdadi
Riots in Rio
Paris
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
TECHNIQUE OBFUSCATION – DO YOU KNOW WHO IS LOOKING AT YOU?
You want to be able to protect your analysts while they’re exploiting OSINT. With certain obfuscation techniques, they can hide in plain sight.
• Creation of safe UNCLASS portals
whereby analysts can view all
relevant information.
• Licensing of data
• Government? Industry?
• Non-attribution/Misattribution
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
TECHNIQUE LANGUAGE ISSUES
Gorilla war
guerrilla war, guerrilla warfareجنگ چریکی jang-e cherīkī
world war جنگ جهانی jang-e jahānī
chemical warfare جنگ شیمیایی jang-e shīmīyāyī
English, Native Language & Transliteration.
Language Translation• Do you use machine translation for automation?• Or prefer to keep your human linguists in the loop?
Misspellings, Emoticons, Abbreviations and Slang.
:)` 8)` B)` :]` 8]` B]` :[` 8[` B[` :(` 8(` B(`
A B C D E F G H I J K L
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
TECHNIQUE SENSEMAKING
• Text Mining
• Clustering/Grouping Similar
Documents
• Topic Discovery
• Predictive Modeling/Machine Learning
• Natural Language Processing (NLP)
• Sentiment
• Categorization/Organize Information
• Entity/Knowledge Extraction
• Fact/Relationship Extraction
• Search and Indexing
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
• Wow them right here – use a screenshot or jump straight into a demo – tell this right away. Then show them how we constructed it, along with a lot of other insight.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
TECHNIQUE SENTIMENT ANALYSISSentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.
Know When You Need Advanced
Analysis.
• Detection of complex sentiments:
• Is this person tweeting/writing under duress? Anxiety? Anger?
• Are they trying to persuade?• What is the veracity of what they’re
saying?
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
# Complaints sentiment
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
TECHNIQUESOURCE VERSUS CONTEXT MAPPING
You don’t have to geotag your tweets to give away your location.
SOURCEThe location from which the data came from.
CONTEXTThe location of content that has been mentioned.
# ISIS /# Daesh saying # Khanasir -# Ithrayaroad not fully cleared yet # Aleppo # Syria
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Decision to act
COLLECTION AND ANALYSIS MUST HOLD UP IN COURT
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
The Intelligence Dilemma – Pareto Principle
Search & DiscoveryIdentifying relevant information, searching various data
sources, “formatting” data for a specific tool,
processing, applying “analytical techniques” within a
tool, typically ad-hoc and manual in nature
Actionable Analysis
Applying specific
tradecraft, vetting
of information
20%80%
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Methodology
DARK MATTER
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
DARK MATTERTHREE KEY CAPABILITIES
1. Real Time and Document Corpus Building for OSINT Analysis
2. Easy Visualizations to Interrogate Data
3. Unstructured Data Insights Delivered in an Obfuscated, Interactive Web PortalOSINT ANALYSIS
& EXPLOITATION
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Resources
• White Paper (January 2014): Text Analytics for Government• Conclusions Paper (September 2013): Applications of Text and
Social Media Analytics• SAS Global Forum Paper (March 2014): Uncovering Trends in
Research with Text Analytics with Examples from Nanotechnology and Aerospace Engineering
• SAS Global Forum Paper (April 2015): Show me the Money! Text Analytics for Decision-Making in Government Spending
• SAS Global Forum Paper (April 2016): Extending the Armed Conflict Location and Event Data Project with SAS Text Analytics
Tom Sabo; [email protected]
@mrtomsab
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx