cyberprobe: towards internet-scale active detection of malicious server

CyberProbe: Towards Internet-Scale Active

Detection of Malicious Servers

Antonio Nappa, M. Zubair Rafique, Juan Caballero

Zhaoyan Xu, Guofei Gu

Research Interests

Malware

Analysis & Defense

Software Security

Vulnerabilities &

Exploits

Network Security

IDS

Forensics

Memory

Program Binary

Analysis

Cyberattacks

Cybercriminals Hacktivists Governments

Cybercrime & Targeted Attacks

Malicious Servers

• Malicious Server Types

– Exploit servers Malware distribution

– C&C servers Control malware

– Payment servers Monetization

– Redirectors Anonymity

– …

• Some operations use P2P – Server-like functionality

Operations & Server Types

Can we find the servers of an operation?

How many servers in each operation?

Where are the servers hosted?

Malicious Servers in the Cloud

• Malicious servers moving to the Cloud

– 60% of Exploit Servers [Nappa13]

• VPS hosting predominantly abused

• Replace dead servers with new ones

• Servers don’t live forever

– Exploit server median lifetime = 16 hours

• Many servers needed!

Dynamic Server Infrastructures

• Honeypots

• Spamtraps

• IDS

• Limitations

– Limited View

– Slow

Server Detection Techniques

• Run malware samples

• Honeyclient farms

– Google Safebrowsing

– Microsoft Forefront

• Limitations

– Limited view

– Specific to one server type

– Expensive

Passive Active

Active Probing

• General

– Any server type and P2P bots

• Scalable Internet-scale

• Fast Internet in a few hours

• Easy to deploy

• Cheap

Active Probing: Benefits

• Active probing approach for detecting

malicious servers

• Adversarial fingerprint generation technique

• Implement approach into CyberProbe

• Use CyberProbe to find malicious servers

– 151 servers in 24 localized/Internet-wide scans

– 75% servers unknown to public databases

– 7000+ P2P supernodes

• Identifies provider locality property

Contributions

Outline

Evaluation

Intro

Approach

Adversarial Fingerprint Generation

Scanning

CyberProbe in a nutshell

Adversarial

Fingerprint

Generation

Malicious Traffic

Benign Traffic

Fingerprints

Seed Servers

Scanning

Port

Target Ranges Malicious Servers

Fingerprint

# Malicious Servers Detected

> # Seed Servers

• Fingerprint server family – Operation + server type

– Possibly multiple fingerprints for same server family

• A fingerprint comprises: – A probe construction function

– A classification function = Snort signature

Fingerprints

Clickpayz1

Probe: GET /td?aid=e9xmkgg5h6&said=26427

Signature:

content: “302”; http_stat_code;

content: “\r\n\r\nLoading…”


• Fingerprint generation requires interacting

with remote seed servers

– Collect requests and responses

• Remote servers controlled by attacker

• Make fingerprinting inconspicuous

– Minimize traffic

– Use inconspicuous probes

Replay traffic!

AFG: Architecture

REPLAY CLUSTERING RRP

EXTRACTION

F

P

SIGNATURE

GENERATION

Benign Traffic

Seed Servers

F

P

F

P

F

P

Fingerprints

Malicious Traffic

AFG: Malicious Traffic

RRP

EXTRACTION FP FP RRPs

• Replay requests to servers in traces

– VPN: anonymity, IP diversity

• Remove benign responses

– Errors, no response

– Check against random resource

AFG: Replay

GET /td?aid=e9xmkgg5h6&said=26427

GET /asdfg.html

Similar?

200 OK

200 OK

evil.com

78.1.2.3

Replayer Sinkholed

Parked

• Cluster RRPs by request similarity

– HTTP: method, path, parameters

– Non-HTTP: packet size, content

• Probe construction function

– Identify TARGET, SET fields

AFG: Clustering

CLUSTERING

F

P F

P

Replayed

RRPs

RRP Clusters

F

P Probe construction

function

• Response parts unique to

malicious traffic

• Token-set signatures

– Snort, Suricata

• Tokenizes fields

– If known protocol

• Multiple sig. per cluster

AFG: Signature Generation

F

P

SIGNATURE

GENERATION

Benign Traffic

F

P

Signatures Clusters

Outline

Evaluation

Intro

Approach


Scanning

• Localized scans

– Some ranges more likely due to locality

1. Localized-reduced

– BGP Route for Seed Server

2. Localized-extended

– All ranges with same description

3. Internet-wide

– Use BGP ranges

Scan Ranges

Google.com

173.194.41.231

173.194.0.0/16

Google Inc.

FP

FP

Google Inc.

173.194.0.0/16

8.8.8.0/24

8.8.4.0/24

8.6.48.0/21

8.35.200.0/21

…

Full Unreserved Allocated BGP

4.3B (100%) 3.7 B (86%) 3.7 B (86%) 2.6 B (60%)

• Horizontal Scanner

– SYN scan Live servers on port

• AppTCP scanner

– Probes live servers with fingerprint

• UDP scanner

– Does not require horizontal scan

Scanners

• Scan rate

– One scanner saturates 1-10 Gbps link Distribute

– Limited to ≤ 60,000 pps; ≤ 400 cps

• Scan order

– LCG for horizontal/UDP, shuffle for AppTCP

• Whitelisting

– 512 MB bit array, O(1) lookup

• Output

– Pcap / result for AppTCP/UDP

– IP list for horizontal

Scanning Properties

Ethical Considerations

• Scan as politely as possible

• Rate-limit scanners

• One fingerprint at a time

• Set up forward, backward DNS entries for scanners

• Set up webpage in scanners explaining experiment

• Remove ranges from providers that request so

• Manually check fingerprints

Outline

Evaluation

Intro

Approach


Scanning

Fingerprint Generation Results

Type Source Fam. Pcaps RRPs RRPs

Replayed

Seeds Finger

prints

Malware VirusShare 152 918 1,639 193 19 18

Malware MALICIA 9 1,059 764 602 2 2

Honeyclient MALICIA 6 1,400 42,160 9,497 5 2

Honeyclient UrlQuery 1 4 11 11 1 1

• 23 fingerprints for 13 families (1 UDP, 22 HTTP)

• Families: 3 exploit kits, 10 malware

• Challenges

• No seed server, families with many traces, no replay

• 11 localized scans

• 9 find previously unknown servers

• 11 Internet-wide scans

• 14 hours (4 scanners), 24 hours (3 scanners)

• 151 servers found

• 15 seeds 10x amplification

HTTP Scans Summary

Coverage Comparison

Cyberprobe VirusTotal URLQuery VxVault MDL

151 (100%) 40 (26%) 23 (15%) 1 (0.7%) 1 (0.7%)

4x coverage improvement

Operations

Operation Fingerprints Seeds Servers # Provid. Provider

Locality

bestav 3 6 23 7 3.3

bh2-adobe 1 1 13 7 1.8

bh2-ngen 1 1 2 2 1.0

blackrev 1 1 2 2 1.0

clickpayz 2 2 51 6 8.5

doubleighty 1 1 18 9 2.0

kovter 2 2 9 4 2.2

ironsource 1 1 7 4 1.7

optinstaller 1 1 18 4 2.0

soft196 1 1 8 4 2.0

TOTAL 14 15 151 47 3.2(avg.)

• Affiliate pay-per-install

– Winwebsec, Urausy, other

• 29 servers

– 11 C&C servers

– 16 payment servers

– 2 web servers for affiliates

• 4 hosting providers (C&C,payment)

– A: 6 payment + 5 C&C

– B: 9 payment + 4 C&C

– C: 2 C&C

– D: 1 payment

Example Operation: BestAV

• Blackhole2-ngen

– 2 – 3 servers simultaneously since October’12

• Blackhole2-adobe

– 13 servers

– 3 known to VT, +2 4d later, +1 13d later

• Doubleighty

– 18 servers

– Visit 9 with honeyclient, 7 exploited

– One month later another starts exploiting

Exploit Server Operations

P2P bots Scan Results

Type Date Port Fingerprint Targets SC Rate Time Found

R 03/19 UDP/

16471

zeroaccess 40,448 1 10 1.2h 55

(0.13%)

I 05/03 UDP/

16471

zeroaccess 2,6B 4 50,000 3.6h 7,884

(0.0003%)

Related Work

Scanning

• Leonard et al. IMC ‘10

• Heninger et al. Usenix Security ’12

• Zmap

Fingerprinting

• FiG

• PeerPress

Signature Generation

• Honeycomb, Autograph, EarlyBird,

Polygraph, Hamsa

• Botzilla, Perdisci et al., Firma

• Active probing approach for detecting

malicious servers

• Adversarial fingerprint generation technique

• Implement approach into CyberProbe

• Use CyberProbe to find malicious servers

– 151 servers in 24 localized/Internet-wide scans

– 75% servers unknown to public databases

– 7000+ P2P supernodes

• Identifies provider locality property

Conclusion

MALICIA Project

• Malware in Cybercrime

• 5 Publications

• Dataset released

• Collaborators:

http://malicia-project.com

http://www.ucsd.edu/

http://www.icsi.berkeley.edu/icsi/

http://www.w-hs.de/

http://www.gmu.edu/

http://research.google.com/

http://www.berkeley.edu/index.html

cyberprobe: towards internet-scale active detection of malicious server

Technology

dead servers

servers unknown

cloud malicious servers

remote seed servers

responses remote servers

f p f p google

new ones servers dont

malicious traffic token