fornet: a distributed forensic network kulesh shanmugasundaram
TRANSCRIPT
ForNet: A Distributed Forensic ForNet: A Distributed Forensic NetworkNetwork
Kulesh Shanmugasundaramhttp://isis.poly.edu/projects/fornet/
What Mechanisms Do We Have? Logging mechanisms and audit trails
Alerts/Logs from security components: Logs only perceived security threats
Host Logs: First thing to get disabled upon intrusion An insider would rather use a host without any
logs Mobility, wireless networks create new problems
Packet Logs: Usually at the edges hence blinded easily Can’t keep data for long… E.g: Infinistream, NFR, NetWitness, SilentRunner
What do we need? A reliable mechanism for analysis and
attribution
An effective response model Current response model is mostly manual Response time is in days or weeks Digital evidence disappears quickly
Tools that complement security components Need some safety net to fall back Forensics is that net
Our Solution… Securely collect, store, disseminate,
and process synopsis of network traffic.
In other words a device analogous to a surveillance camera for the network. Even better – a co-operating network of such surveillance cameras.
Goal of Project ForNet: development of tools, techniques, and infrastructure to aid rapid investigation and identification of cyber crimes
ForNet: A Unified, End-to-End Approach to Network Forensics
SynApp: equipped routers or hosts. Primary function is to create synopses of network traffic. May have limited query processing and storage component as well.
Forensic Server: Responsible for archiving synopses, query processing & routing, enforcing monitoring, security policies, for the domain.
ForNet Domain: A domain covered by single monitoring and privacy policies.
Design Goals and Trade-Offs
1. “Complete” & “Correct” Evidence2. Longevity 3. Succinctness4. Accessibility of Evidence5. Security & Privacy of Evidence4. Ubiquity5. Incremental Deployment6. Modular 7. Scaleable Design
Research Challenges
Networking , Distributed
SystemsTheory
Systems
Legal Issues
Security
• High-speed packet capture• Storage systems• Databases• GUI
• Legally admissible evidence• Chain of custody
• Privacy
• Secure architecture • Attacks on ForNet
• Use of ForNet
• Design of Synopses• Quantitative Analysis
• Identification of events• Architecture of ForNet• Query Routing• Fault Tolerance
Components of ForNet
Architectural components can be grouped under following functions:
1. Data Collection2. Data Retention3. Data Dissemination
SynAppMonitoring Policy
Query LanguagePanorama
Forensic ServerPrivacy Policy
Architecture of a SynApp
Network Stream
Network Filter
Buffer Manager
Synopsis Controller
HBF
Neoflow
Histograms
Syn
op
sis
En
gi n
e
Synopses Sent ToForensic Server
ConfigurationManager
MonitoringPolicy
SynopsesLibrary
Synopses in ForNet
Data Type
Link Link Content Mappings Aggregates
Potential
Evidence
Protocol Layer IPs,ports, protocol TOS, TTL
Packet packet sizes inter-arrival times
Type of payload
audio, video…
IM Keywords Content
MAC to IP DNS to IP VirtualHost to IP AS-Name to IP
#Packets Delays etc.
Synopses
Connection record Neoflow Header-dump
Neoflow-NG*
HBF Packet-dump
MAC Records DNS Records HTTP Records*
BGP Records*
Histogram SCBF*
* Currently being implemented
Architecture of Forensic Server
ArchiveManager
QueryParser
PrivacyManager
QueryPlanner
XMLGenerator
HB
F
Neofl
ow
DN
S R
ecord
s
His
tog
ram
BG
P T
ab
les
ConfigurationManager
Query FromAn Analyst
Response ToThe Query
Scan
Dete
cti
on
Fin
d Z
om
bie
s
Ste
pp
ing
ston
es
SecurityManager
Forensic Server Forensic Server has the following functions:
Data archiving Query processing Policy management and enforcement
Data Archiving Periodically receives data from SynApps & archives
them on disk.
Forensic Server (cont…) Query Processor:
XML-based query language No relational databases (uses Berkeley DB)
Does coalescence of synopses Given a query can locate appropriate synopses Generate and execute a query plan to answer the query
Distributed query processing Currently being implemented to talk to nearby forensic
servers to answer/corroborate queries/evidence
Policy Enforcer: Two types of policies in ForNet:
Monitoring Policy– informs user of what is being monitored
Privacy Policy – informs user of how/with whom data is shared etc.
A user interface for:
Building queries Data mining Visualization
Architecture of Panorama
CommunicationManager
XML Parser
Plug-inManager
UserInterfaceManager
Chart Generator
Network Graphs
3D Link Maps
Summary Views
Data Cleansing
Miners
Queries,Various Commands to
Forensic Server
Capturing Evidence on The Internet Internet is a gigantic state machine:
To support forensics we need to know its precise state at any given time!
Therefore, we need to keep track of state transitions
State transitions can be characterized by:1. Links/connections between nodes2. Link content3. Protocol mappings and aliases4. Aggregates generated by state transitions
Archiving raw traffic for this information is an overkill!
What is a Synopsis?
Data structures and algorith
ms for
representing a set
of elements
succinctly with predefined loss
in
information and has the abilit
y to
recall the original set of elements
with a preset accuracy.
Properties of a Good Synopsis Contains enough data to answer certain classes of queries
Who sent payload “xyz”? What did host bug.poly.edu send?
Contains enough data to quantify confidence of its answers
I’m 99.37% sure bug.poly.edu sent “XYZ”
Have small memory footprint and easy to update Need 20GB/day to keep 1TB/day of raw network data Need to compute two hashes per packet
Resource requirements are tunable Can only afford 3GB/day, adjust the accuracy to accommodate
this.
Advantages of Using Synopses
Succinct representation of raw data makes it possible to transfer network data to disks
Sharing/transferring raw data over network is impossible but synopsis can be moved to remote sites
Query processing would be expensive with raw data What’s the frequency of traffic to port 80 in the past week?
(raw data vs. a histogram)
Easily adaptable to various resource requirements Can adopt the size, processing requirements of a Bloom Filter
based on various hardware resources and network load
Can retain potential evidence for months!
Allows for cascading different techniques!
Enterprise(100Mbps)
Subnet(50Mbps)
Edge(1Gbps)
Core(40Gbps)
CR-BFTopology maps
SCBFSimple CRTopology maps
NeoflowHBFDNS mappings
Payload SamplingMAC-layer maps
Cascading of Synopses
Packet Digests & Bloom Filters Snoeren et. al. used it successfully in SPIE for
single packet traceback (“Hash-Based IP Traceback”)
Space Efficient: 16-bits per packet (m/n=16) and 8 hashes (k=8)
false positive (FP) = 5.74 x 10-4 No false negatives!
However, suppose we don’t have packets. We only have some excerpts of payload Don’t know where the excerpt was aligned in the packet
Extend Bloom Filters to support excerpt/substring matching
Block-based Bloom Filter
p0 p1 p2 … … p(n/s)P =
create s-byte blocks of payload
P = (p0|0) (p1|s) (p2|2s) … … (p(n/s)|n/s)
append blocks’Offset (in payload)
Insert each block into a Bloom Filter
P =
s
BBF = (p0|0) (p1|s) (p2|2s) (p2|3s) … (p(n/s)|n/s)
Q =
s
q0 q1 q2 … … q(l/s)Q =
create s-byte blocksof query string
Try all possible offsets
(q0|0)
X
(q0|s)
q0=p1
(q1|2s)
q1=p2
(q2|3s)
q2=p3
“q0q1q2” was seen in a payload at offset ‘s’
H1(q0|0)=1H2(q0|0)=1H3(q0|0)=0
P1 = A B R A C A
P2 = C D A B R A
For query strings: “AD”, “CB”, “DR”, “AA” etc. BBF falsely identifies them as seen in the payload!
BBF =(A|0) (B|s) (R|2s) (A|3s) (C|4s) (A|5s)
(C|0) (D|s) (A|2s) (B|3s) (R|4s) (A|5s)
(A|0) (B|s) (R|2s) (A|3s) (C|4s) (A|5s)
(C|0) (D|s) (A|2s) (B|3s) (R|4s) (A|5s)
“Offset Collisions”
Because BBF cannot distinguish between P1 and P2
Hierarchical Bloom Filter
• An HBF is basically a set of BBF for geometrically increasing sizes of blocks.
(p0|0) (p1|s) (p2|2s) … … (p(n/s)|n/s)level-0
P =
(p0p1|0) (p2p3|2s) (p(n/s-1)p(n/s)|(n/s-1))level-1
(p0p1p2p3|0)level-2
Hie
rarc
hic
al
Blo
om
Fil
ter
Hierarchical Bloom Filter
(q0|0) (q1|s) (q2|2s) (q3|3s) … (q(n/s)|l/s)level-0
Q =
(q0q1|0) (q2q3|2s) (q(l/s-1)q(l/s)|(l/s-1))level-1
(q0q1q2q3|0)level-2
• Querying is similar to BBF. • Matches at each level can be confirmed a level above.
Adapting an HBF for ForNet So far an HBF can attest for the
presence of a bit-string in payloads
We need to tie this bit-string to a source and/or destination hosts
Our Approach: Similar to tying an offset to a block/bit-
string In addition to inserting (block||offset) also
insert (block||offset||hostid) Hostid could be (srcIP||dstIP)
Coalescence of Synopses HBF requires:
Source IP, destination IP, excerpt But where do we get Source IP, Destination IP
Connection Record/Neoflow Given two time intervals can give us list of source, destination
IPs Coalesce these synopses
Who sent “DEADBEEF”between time T1 and T2?
QueryProcessor
Connectionrecord/Neoflow
HBF
Give all connections between time T1 and T2?
Did (source, destination) IP send“DEADBEEF” between time T1 and T2?
(source, destination) IPs of connectionsbetween time T1 and T2?
YES/NO
ForNet in Intranet Usage Investigation based on payload characteristics
Determine victims of worm, trojans and other malware. Detection of potential victims of phising and spyware Source of IP theft
Investigation based on connection characteristics Detection of zombies Detection of malware based on connection pattern
Investigation based on aggregate characteristics Insider abuse Network resource abuse
Etc Etc …
ForNet Deployed on Internet Investigation based on payload characteristics
Traceback based on partial content of single packets
Source of malware, worms, etc.
Investigation based on connection characteristics Stepping stone detection
Investigation based on aggregate characteristics Attack attribution
Current Status Implemented a PC based SynApp device for
placement within an intranet. Connection records, HBF, DNS, NewFlow, Mappings
Implemented Forensics Server with simple querying capabilities.
Current Forensic Server has 1.3TB of storage with over 3 months worth of data from the edge-router and two subnets
Normal bandwidth consumption of network is about a 1 – 2 TB/day
Synopses reduces this traffic to about 20GB/day A 4TB Forensic Server will take over operations in July
Implemented Panorama (GUI Client)
Tracking MyDoom Recorded all email traffic for a week
Using HBF and raw traffic Was not aware of MyDoom during this collection
When signatures became available we used them to query the system
To find hosts that are infected in our network How the hosts were infected
Some statistics: 679 hosts originated at least one copy of the virus
52 of which were in our network These hosts sent out copies of the virus to 2011
hosts outside our network boundary
Neoflow-NG Keeps track of everything Neoflow has
plus: Individual packet sizes
Be able to recall the packet sizes in the same sequence
Inter-arrival times of packets Be able to recall the arrival times of individual
packets
Content type of flow Be able to recall types of content carried by flows Audio, Video, Plain-Text, Encrypted, Compressed etc.
Zombie Detection Are they any zombies in Poly network?
And if so, how do we identify them? Criterion: Hosts that portscan and use IRC
at likely zombies. Tabulate the amount of IRC activity for
each host. Tabulate the amount of portscan activity
for each host. For each host: ZOMBIE_LIKELIHOOD =
PORTSCANS * IRC_FLOWS Sort all hosts by their
ZOMBIE_LIKELIHOOD. List the top 200 hosts.
Work currently in progress… Identification of useful network events
Network is the virtual crime scene that holds evidence in the form of network events
Developing efficient synopses Handling connection oriented & connectionless
traffic Techniques for payload attribution (esp. IRC/IM) Automatic cascading of various synopsis techniques
Analysis and mining techniques Zombie detection Stepping stone detection
Work currently in progress… Integration of information from synopses
across networks Development of a protocol for secure
communication of various ForNet components
Query processing and storage of synopses A query language transparent of various underlying
synopsis techniques A query manage system to interpret the language
for the underlying database Various storage and garbage collection strategies
for collected-SynApps Storage and query processing infrastructure for
Forensics Servers
Research Challenges Integration of information from synopses across
networks Real power of ForNet is realized when information from
SynApps is fused to answer queries Development of a protocol for secure communication of
various ForNet components
Query processing and storage of synopses A query language transparent of various underlying
synopsis techniques A query management system to interpret the language
for the underlying database
Analysis A frame-work to compare effectiveness/trade-offs of
synopses
Attacks on ForNet Resource Exhaustion:
Flood the network with random bits of data
Malicious Transformation: Create packets of length (blocksize – 1)
Stuffing: Stuff every other block with application dependent escape
characters For smaller blocks we can try to guess for larger blocks it is not
possible!
Exploiting Collisions: Hash collisions
Very unlikely for strong hash functions We use a random seed for every HBF so it makes it more difficult
Packet collisions A possibility in Block-based Bloom Filters but not in HBFs
Streaming transformations: Encryption, compression