automatic protocol format reverse engineering through context-aware monitored execution zhiqiang lin...

21
Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2 , Dongyan Xu 1 , Xiangyu Zhang 1 1 Purdue University 2 George Mason University February 12 th , 2007 The 15 th Annual Network and Distributed System Security Symposium

Upload: alisha-phillips

Post on 03-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Automatic Protocol Format Reverse Engineering through

Context-Aware Monitored Execution

Zhiqiang Lin 1

Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1

1Purdue University2George Mason University

February 12th, 2007

The 15th Annual Network and Distributed System Security Symposium

Motivation

Protocol reverse engineering A process to recover protocol specifications

E.g., fields and their relationships

Applications: Network-based Intrusion detection – DoS

attacks, Port Scans, Computer Systems Network management – correctly recognize and

monitor traffic Fuzz Testing – s/w testing technique …

Challenges

0x0040: cd46 4745 5420 2f6e 6577 732e 6874 6d6c 0x0050: 2048 5454 502f 312e 300d 0a55 7365 722d0x0060: 4167 656e 743a 2057 6765 742f 312e 31300x0070: 2e32 2028 5265 6420 4861 7420 6d6f 64690x0080: 6669 6564 290d 0a41 6363 6570 743a 202a 0x0090: 2f2a 0d0a 486f 7374 3a20 3132 392e 31370x00a0: 342e 3838 2e37 310d 0a43 6f6e 6e65 63740x00b0: 696f 6e3a 204b 6565 702d 416c 6976 650d.0x00c0: 0a0d 0a

Multiple fields in a single message Non-static size of fields Complex relationships among protocol fields

Sequential

Parallel

Hierarchical

Challenges

HTTP-Request = Request-Line (( general-header | request-header | entity-header ) CRLF)* CRLF [ message-body ]

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

Parallel

Sequential

Hierarchical

A BNF Specification of HTTP Request (RFC2616)

Note: SP and CRLF are separators

**Hierarchical relation: A field can be further divided into multiple sub-fields**Sequential relation : Captures the ordering between adjacent fields in a protocol.**Parallel relation: The positions of two or more fields are exchangeable in the protocol specification.

Related Work

Network Trace Protocol Informatics Discoverer [W. Cui et. al. Security’07]

Binary Analysis Polyglot [J. Caballero et. al. CCS’07] Automatic Network Protocol Analysis [G.

Wondracek et. al. NDSS’08]

Observation 119 int read_header(int sid) { ... 129 sgets(line, sizeof(line)-1, conn[sid].socket); … 137 if (sscanf(line, "%[^ ] %[^ ] %[^ ]", conn[sid].dat->in_RequestMethod,

conn[sid].dat->in_RequestURI, conn[sid].dat->in_Protocol)!=3) ... 147 while (strlen(line)>0) { ... 154 if (strncasecmp(line, "Cookie: ", 8)==0) 155 strncpy(conn[sid].dat->in_Cookie, (char *)&line+8,

sizeof(conn[sid].dat->in_Cookie)-1); 156 if (strncasecmp(line, "Host: ", 6)==0) 157 strncpy(conn[sid].dat->in_Host, (char *)&line+6,

sizeof(conn[sid].dat->in_Host)-1);… 160 if (strncasecmp(line, "User-Agent: ", 12)==0) 161 strncpy(conn[sid].dat->in_UserAgent, (char *)&line+12,

sizeof(conn[sid].dat->in_UserAgent)-1); 162 } ... 187 } Code snippet in http.c (null-httpd-0.5.0)

REQUEST LINE field divided into

METHOD, REQUEST URI and

HTTP VERSION

• Cookie , host, user-agent are Parallel fields

AutoFormat -- Basic Idea

Execution Context

Protocol Fields

G E T / n e w s …

Context

One Field

Another Field

System Overview

Context-aware Execution Monitor

GET /news.html

0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr… 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr… 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white

Log

call stack

EIPinput

Protocol Field Identifier

Analyze log file Step 1: build protocol field tree from the

logged data. Step 2: refine the tree using three heuristics Step 3: output the result

Example: Apache log data

0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr…24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr …24 '\n' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x26187 ->0xF5A8->ap_read_request->ap_rgetline_core23 '\r‘ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x26322 ->0xF5A8->ap_read_request->ap_rgetline_core 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white…

GET /news.html HTTP/1.0\r\n \r\n GET

Step 1 -- Building Protocol Field Tree

root

GET /news.html HTTP/1.0\r\nUser−Agent: Wget/1.10.2 (Red Hat modified)\r\nAccept: */*\r\n….

GET /news.htmlGETHTTP/1.0

Contains offsets of all input data

Parent node contains offsets of its children

Step 1: Building Protocol Field Tree

GET /news.html HTTP/1.0\r\n

Hnews.htmlGET

GET /news.html

GET /news.html HTTP/1.0\r\n

HTTP/1.0 \r \n

TTP/1.0 /

/

/

news.html

news.html

H

H

TTP/1.0

TTP/1.0

Overly fine grained fields

Redundancy in fields

Missing SPACE

before “ /n”

Step 2: Refinement (Tokenization)

GET /news.html HTTP/1.0\r\n

/news.htmlGET

GET /news.html

GET /news.html HTTP/1.0\r\n

HTTP/1.0 \r\n

/news.html

/news.html

HTTP/1.0

HTTP/1.0

HTTP/1.0

GET /news.html HTTP/1.0\r\n

Hnews.htmlGET

GET /news.html

GET /news.html HTTP/1.0\r\n

HTTP/1.0 \r \n

TTP/1.0 /

/

/

news.html

news.html

H

H

TTP/1.0

TTP/1.0

Merge 2 child nodes if their content can form one token –based on TEXT-BASED PROTOCOLS

Step 2: Refinement (Redundant Node Deletion)

GET /news.html HTTP/1.0\r\n

/news.htmlGET

GET /news.html

GET /news.html HTTP/1.0\r\n

HTTP/1.0 \r\n

/news.html

/news.html

HTTP/1.0

HTTP/1.0

HTTP/1.0

GET /news.html HTTP/1.0\r\n

/news.htmlGET

GET /news.html HTTP/1.0 \r\n

An internal node is redundant if it has only 1 child

Step 2: Refinement (Node Insertion)

GET /news.html HTTP/1.0\r\n

/news.htmlGET

GET /news.html HTTP/1.0 \r\n

GET /news.html HTTP/1.0\r\n

/news.htmlGET

GET /news.html HTTP/1.0 \r\n

Insert a new child node to parent IF the offsets of children do not match the parent.

Step 3: Output the Result

Parallel & Sequential Hierarchical

GET /news.html HTTP/1.0\r\n

/news.htmlGET

GET /news.html HTTP/1.0 \r\n

/news.htmlGET

HTTP/1.0 \r\n1

23

4

Parallel:*Collect execution history of each node * For a parent- if child nodes share similar history –MARK it

Sequential:*Pre-order traversal of tree

-lists the leaf nodes -parent of multiple parallel

nodes

Evaluation

Implemented on top of Valgrind-3.2.3 Also applies to QEMU, PIN

Benchmark 30 messages with six known protocols and one

unknown protocol. Evaluation Metric

Re: Ratio of exact match

|(A ∩ W)|/|W| A: set of fields identified by AutoFormat W: set of fields identified by Wireshark

For context aware execution monitor

Overall Result

Averages:Re(F) = 88.5% Re(H) = 98.0%Re(P) = 100.0%Re=93.4%

Re(F): Re for finest-grained fieldsRe(H): Re for hierarchical fields

Re(P): Re for parallel fields

100% match with Wireshark

* (-) => |P| for Wireshark=0

Discussion

Dynamic Trace Dependency -AutoFormat does not detect message formats not present in the execution trace

Byte granularity – AutoFormat does not detect protocol fields at bit level

Protocol State Machine – AutoFormat does not correlate multiple messages of same protocol session.

Obfuscated binaries- AutoFormat does not handle these type of inputs.

Conclusion

Paper also includes the Slapper Worm Messages as a part of second experimental results set.

AutoFormat A tool for automatic protocol format

extraction. Key insight

A protocol implementation is programmed to recognize the protocol format and usually contains protocol field-specific execution context, and we can actually leverage such context to infer the hierarchical structure of protocol fields, and even get their BNF structures.

Thank you

For more information:

{zlin, dxu, xyzhang}@[email protected]

Q & A