powerpoint presentation symposium/2016...title powerpoint presentation author andrew p. shaffer...
TRANSCRIPT
Test and Evaluation/Science and Technology Program
Rapid Data Analyzer for Net-Centric
Systems Test (RDAN)
33rd Annual International Test and Evaluation Symposium
October 4, 2016
Mr. Andrew Shaffer (Technical Lead)
Applied Research Laboratory, The Pennsylvania State University
This project was funded by the Test Resource Management Center
(TRMC) Test and Evaluation/Science & Technology (T&E/S&T)
Program through the U.S. Army Program Executive Office for
Simulation, Training, and Instrumentation (PEO STRI) under
Contract No. W900KK-13-C-0015.
Distribution Statement A. Approved for public release; distribution is unlimited.
Outline
2Distribution Statement A. Approved for public release; distribution is unlimited.
• RDAN System Overview
• Test & Evaluation Need
• Tools & Background
• RDAN System Architecture
• RDAN Functional Operators
• Key Science and Technology Innovations
• Innovations in the RDAN Architecture
• Potential Use Cases & Applications
• Summary and Future Work
• Points of Contact
RDAN System Overview
RDAN applies automated analysis using cloud computing
technologies to reduce the time from Data to Decision
• RDAN collects and analyzes high-
volume data from multiple data
sources- Unstructured text (TRL 5)
- Structured data (TRL 5)
- Image & video (TRL 4)
- Voice - Future
- Test & Training Enabling
Architecture (TENA) - Future
• RDAN automatically analyzes and indexes data using cloud technologies to
support rapid search and analysis operations on large data sets
• RDAN provides custom parallel algorithms and architecture to reduce time
from Data to Decision from hours/days/weeks to seconds
3Distribution Statement A. Approved for public release; distribution is unlimited.
Diverse data
formats collected
using separate
T&E systems
Modern distributed T&E events generate
large amounts of unstructured data that
are hard to analyze
Current analysis
techniques
Data to
Decision
extremely slow
(hrs/days/wks)
High volume data
Decision
• High Volume Data Collection
– Unable to efficiently collect and analyze large
unstructured data (i.e. text, voice, chat,
image/video) and structured data across multiple
sources and environments
– Lack of capability to quickly review past historical
data limits value of collected test records
• Adaptability and Scalability
– Difficult to scale T&E system as data grows
Unstructured, Structured,
Image/Video & Voice Data
• Analysis tools
– Data is often manually processed with slow
response time from Data to Decision
– Manual processing introduces human error
– T&E systems lack automation tools to
analyze large volumes of unstructured text,
structured data, image/video data, and voice
data
– T&E systems lack ability to perform deep
analysis on test event as it occurs
Slow response time from Data to Decision
Test & Evaluation Need
4Distribution Statement A. Approved for public release; distribution is unlimited.
Tools & Background
Private big data processing cloud architecture is
optimized for processing large volumes of data
ARL Test
Cloud System
• RDAN leverages open-source software packages
- Hadoop/Accumulo software stack is freely available and
continually being updated by the community
• RDAN is optimized to securely process big data
- Software framework designed for scalability & fault tolerance
- Storage of data on compute nodes eliminates I/O bottlenecks
- Large clusters with inexpensive commodity components
support massive aggregate I/O, CPU, and network capacity
- System hardware and network can be tuned for workload
- Graphics Processing Units (GPUs) can be added to support
compute-intensive workloads
- Private cloud architecture secures data storage and processing
5Distribution Statement A. Approved for public release; distribution is unlimited.
RDAN System Architecture
Multilevel Index
Wildcard Dictionary
Document Token Count
Auxiliary IndexesData
(GFI)
Analysis &
Fusion
Semantics Library
Index
Files
Data
Store
Ind
exes
Query Rule MgtFilter MgtFilter
Library
Indexing EngineData Conversion Engine
Query & Analysis EngineUser Interface EngineTester / Analyst
Convert &
Tag Src
Automated
Annotation
Filter &
Extract
10 gigE
Basic Query /
Management Command
Results
Query
Results
High-Level Query
Results
A
P
I
Unstruct
Text
Index
Generator
Graphical User
Interface
(Web-Based)
Data
Ingest
Custom
Analysis
Custom User
Interface Custom
Analysis
Data
Self-
Describing
Canonical-
Format Data
with
Extracted
Entities &
Metadata
A
P
I
Legend
RDAN Core
System
Use-Case
Specific
RDAN’s purpose is to develop new technologies to rapidly
process high volume unstructured and structured dataAPI: Application Programming
Interface
6Distribution Statement A. Approved for public release; distribution is unlimited.
RDAN Functional Operators
7Distribution Statement A. Approved for public release; distribution is unlimited.
RDAN supports a diverse set of query and analysis tools that
can be combined to support automated analysis
Indexes & Data
Index-Level Iterators
Logical Iterators
Utility Iterators
Client-Side
Analysis
OR
User
Interface
- Multilevel index
- Wildcard dictionary
- Auxiliary index structures
- Data blocks
- N-Gram iterator
- Term iterator
- Logical (AND/AND-N/OR/NOT)
iterators (can be composed to
form trees of arbitrary depth and
complexity)
- Field selection iterator
- HDFS/Accumulo file writing iterator
- Node-level sorting iterator
- Node-level aggregation operators
- Top-k query optimizer
- Relevance ranking normalizer
- Generate result snapshots
- Global sorting iterator
- Global aggregation operators
- Clustering & outlier detection
- Source association
- Semantics rule evaluation
- Enter & manage queries and
semantics rules
- Display results (dashboard,
timeline, record list, etc…)
- Query preprocessing (e.g. wildcard
query expansion)
- Accumulo API
Single Node Processing
(Client Computer)
All Nodes Processing
(RDAN Cluster)
AND AND
Key Science & Technology
Innovations
8Distribution Statement A. Approved for public release; distribution is unlimited.
Diverse data
formats collected
using a single
system
New automated
analysis and data
fusion
Data to Decision
is rapid
(seconds)
• Novel Data Ingestion Architecture
– Custom parallel algorithms provide scalability, high
throughput, and low latency for storage, indexing,
and analysis using the latest cloud technologies
• Pipelined Indexing Architecture
– New data structures and algorithms significantly
improve indexing throughput while maintaining low
query latency
• Extensible Canonical Data Format
– Self describing data format allows new sensors
and analysis modules to be added to system
without modifying system architecture
• Flexible Search and Analysis Tools
– New semantics rules allow analysts to search and
analyze data using high-level constructs
Rapid response time from Data to Decision
Decision
RDAN allows testers and analysts to
perform near real-time analysis of complex
distributed T&E events
High volume data
Unstructured, Structured,
Image/Video & Voice Data
Innovations in the RDAN
Architecture
Sensor Data, Metadata, and
IndexesSupport diverse sensors at high data rates
• Self-describing canonical data format
• High-throughput/low-latency indexes
Analysis Tools
RDAN Processing
Web-based GUI
RDAN Architecture
Unstructured, Structured,
Image/Video & Voice Data
Tester / AnalystDistributed Test Data Reduce time from data to decision
• Reduce decision from hrs/wks to secs/mins
Review large current & historical data sets
during test events
• Flexible GUI supports iterative processing &
drill-down analysis
Automate analysis & support semantics
rules to reduce human error
• Diverse query and analysis operators
• Framework for text analytics
RDAN
Utilize h/w efficiently
• Low cost COTS h/w
• Requires less h/w than
Accumulo-only systems
Data Conversion
Support near real-time
ingestion & indexing
Convert diverse data sources to
self-describing canonical format
• Multi-field automated annotation
• High-throughput filtering & indexing
Implement scalable high performance
image and video processing algorithms
RDAN enables rapid data to decision analysis at low cost9Distribution Statement A. Approved for public release; distribution is unlimited.
Potential Use Cases &
Applications
PACE
R
RDAN
IA/Cyber T&E Exercises
Distributed
T&E EventsImage & Video
Rapidly analyze large cyber
audit logs to prioritize
vulnerabilities for resolution
Provide near real-time feedback
about test events to improve
test range utilization
Rapidly collect, filter, store, and
analyze diverse high volume
data collected during T&E
events
Automate analysis of large
volumes of image and video
data collected during T&E
events
RDAN can support
multiple T&E needs
10Distribution Statement A. Approved for public release; distribution is unlimited.
Summary & Future Work
• RDAN is a prototype end-to-end system that builds on secure private cloud technologies to automatically analyze large volumes of unstructured, structured, image/video and voice data
– Reduces the time from Data to Decision from hours/days/weeks to seconds
– Offers interactive and automated analysis of both live and recorded data
– Scalable and highly configurable to support multiple Test and Evaluation programs
• RDAN mitigates risk by providing near real-time analysis of data of different types, structures, and sizes
– Near real-time analysis of test event saves time and money
– Automated processing of data minimizes human errors
– Supports review of live data and collected historical data during test
• Proposed future developments− Further mature RDAN system technologies
− Upgrade RDAN image and video processing algorithms
− Increase system performance using GPUs
− Identify additional use cases to utilize RDAN technologies
Currently seeking transition sponsors for future funding to
support further maturation of RDAN
11Distribution Statement A. Approved for public release; distribution is unlimited.
Points of Contact
Mr. Bruce Einfalt
Principal Investigator
814-863-4142
Mr. Andrew Shaffer
Technical Lead
814-863-0312
Mr. Manuel Gonzalez-Rivero
Image Processing Lead
814-865-9583
12Distribution Statement A. Approved for public release; distribution is unlimited.