structural analysis of pe files

25
Structural Analysis of PE Files Instructor Davide Maiorca Web Security and Malware Analysis M.Sc. in Computer Engineering, Cybersecurity and Artificial Intelligence University of Cagliari, Italy

Upload: others

Post on 04-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Structural Analysis of PE Files

InstructorDavide Maiorca

Web Security and Malware Analysis

M.Sc. in Computer Engineering, Cybersecurity and Artificial Intelligence

University of Cagliari, Italy

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Changes to Lectures and Exam - CoVID-19

• Due to the CoVID-19 emergency, the course will undergo some changes

• 3 lectures slots for 9 weeks + 1 slot the last week– Each slot lasts 40/45 minutes + 10 minutes pause

– The practical part will be distributed through three slots

• Deadlines for the assignments are canceled. All the assignments must be sent before takingthe final oral test (dates TBD)

• All final oral tests will be after all courses are completed (reasonably, June/July)– Those who passed the first intermediate test will take the exam as if it was the second intermediate

test (same rules)

– The way the exam will be taken (remotely/in-person) is yet TBD

• Stay safe. This is what matters the most at this moment. Don’t go out, follow the rules.

2

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Compiling a Program

3

Source (.c) Object File (.o)PE Executable

(.exe)

Compiler Linker

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

PE File

• Acronym for «Portable Executable»

• A data structure that contains the necessary information to be loaded in memory to execute a program

• Used by executable, object codes, and linked libraries (.dll)

• Organized in a clear structure– Headers

– Sections

• PE files contain information also not necessarily related to the executable code– Icons

– Images

– Audio resources

– Etc.

4

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

PE File Structure

5

DOS MZ Header

DOS Stub

Image NT Header

Image Optional Header

Section Table

Sections

Image File Header

• Sections contain the real data (executable+non-executable+other information) that isloaded in memory

• Multiple programs can be used to analyze the structure a PE File

• We will use the following tools:- A Hex Editor (e.g., Hex Edit)- A visualizer of the PE file structure

(CFF Explorer VIII and PEView)

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

DOS Header and Stub

• Contain the information printed when executing the file in a DOS command prompt (withoutWindows on)– «This program cannot be run in DOS mode»

• It is actually a true «program in the program», written in 16-bit assembly– Composed of a header and a stub

• Header– Made of the first 64 bytes of the file.

– Each group of bytes constitutes a field of a data structure that specifies the characteristics of the program

– For example, the first two bytes are the magic number of the 16-bit program

– Contains a reference to the real starting point of the file

• Stub

– Variable size

– Contains the assembly instructions that are used to print the message

6

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

DOS Header – Hex Representation (Hex Edit)

7

• The DOS Header is adata structure made ofmultiple two-byte words(the first three are circledas an example) and a final 4-byte word

• The first two bytesrepresent the DOS MagicNumber (MZ)

• The last 4 bytes are the pointer to the real PEheader

Magic Number: 0x5a4d

The PE header starts at 0x00000108 (careful about the byte order: we are in Little Endian – the last bytes are read first)

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

DOS Header – Hex Representation (Hex Edit + CFF Explorer)

8

Word: 2 bytesDword: 4 bytesQword: 8 bytes

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

DOS Stub – Hex Representation (Hex Edit)

9

• The DOS Stub is a littleprogram that essentiallyprints a string

• The first bytes contain thestring «This program…»

• The rest of the bytes are16-bit assemblyinstructions thatprint the string

Little Endian strings are read exactly in the order they are proposed

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

PE Header

• Data structure composed of three sub-data structures

• IMAGE_NT_HEADER– Contains the PE signature

• IMAGE_FILE_HEADER– Contains information about the CPU

– Contains the number of sections

– Contains the information about debugging symbols

• IMAGE_OPTIONAL_HEADER– Describes essential information about the position of the code

– For example, the beginning of the assembly instructions and the beginning of non-instruction data

– Contains a special entry, at the end of the header, called DATA_DIRECTORY, which is an array of additional information needed by the executable

10

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Image NT Header– Hex Representation (CFF Explorer)

11

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Image File Header – Hex Representation (+ CFF Explorer)

12

Characteristics tellsomething about the factthat the file is executable

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Image Optional Header – Hex Representation (+ CFF Explorer)

13

AddressOfEntryPoint: First instruction that is executedBaseOfCode: (relative) starting address of the code sectionBaseOfData: (relative) starting address of the data section

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Image NT Header – Data Directory

14

Optional fields that contain additionalinformation employed by the file

RVA stands for Relative Virtual Address

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Addressing

• During the execution, various Sections of the PE file are copied into memory for the execution

• Each instruction/data will have a proper physical address

• However, the executable has its own process space, and it will use its internal addresses, calledvirtual addresses

• However, in the headers you do not see full virtual addresses, but only relative virtualaddresses (RVA)

• RVA are essentially offsets that allow to find a specific section at a certain distance from the base address

– RVA_ADDR = VIRTUAL_ADDRESS – BASE_ADDRESS

• A base address is a sort of «base reference» from which all virtual addresses are calculated(and then subsequently mapped to physical addresses)

15

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Section Table

• Comes right after the first headers

• Each entry of the table is organized as follows:– 40 bytes long for each entry

– Each entry contains the information about the name, the size and the position of the section

• Most important entries in Section Table:– Name: string of 8 bytes with the Section name

– Virtual size: size of the Section after memory load

– Raw size: size of the Section in the file

– Virtual address: the virtual address to which the section is mapped in memory

• The virtual size is often different from the raw size– The file contains many bytes to zero (not initialized) that are discarded

– At the same time, sections must be aligned when loaded in memory (multiple of 4 kb)

16

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Section Table – Hex Representation (+CFF Explorer)

17

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

PE Loading in Memory*

18

*https://furysecurity.tistory.com/28Note: The proposed imagerefers to another executable

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Main Sections

• .text: contains the executable code (assembly instructions)

• .data: contains the initialized, non-executable data (read and write)

• .rdata: contains the initialized, non-executable, and read-only data (you cannot write on theseSection)

• .bss: contains non-initialized data

• .rsrc: Section that holds resources (e.g., images, audio), organized as a data tree– Easily modifiable

• .reloc: Section that contains information about library relocation (see next slides)

19

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Static and Dynamic Linking

• Static linking is the process of including all functions that can be called by a module in the executable– Static libraries have extension .lib in Windows

– This process is typically inefficient as it immediately loads all functions in memory

• Dynamic linking, on the contrary, loads only the required functions at runtime– Performed at runtime, with Dynamic-Link Libraries (DLL)

• To this end, the executable calculates the address of library functions by putting them in a table called Import Address Table (IAT)– The addresses are calculated starting from the function RVA

– In this way, it is straightforward to immediately reference a function

– IAT is compulsory in PE files

• DLLs (and .exe files themselves) also contain an Export Address Table (EAT)– EAT describes which functions, along with their addresses, can be imported by an executable,

along with their references (this information may also appear in the .rdata section)

20

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Import and Export Address Tables (IAT and EAT)

21

• The IAT of the executable refers to theEATs of the DLLs

• EATs contain the addresses of thefunctions employed by the code,calculated at runtime

• EAT may not appear in .exe files

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Relocation and Alignment

• Relocation is the process of changing the base virtual address of a library/executable when itis not possible to write on the established base address (in the same process area)

• It typically occurs in programs that employ libraries or for multiple executables in the sameprocess space

• Relocation occurs mostly during the linking phase, but it can also occur during execution by employ information in the .reloc section

• The concept of alignment is related to how Sections are placed in memory

• The position of each Section should be multiple of 4096 byte (4kb)

22

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Relocation and Alignment

23

test.exe (process image)

Base Addr: 0x40000000

Base Addr: 0x40000000

Base Addr: 0x40400000

DLL B

DLL A

DLL B

DLL B is relocated as its base addressis already occupied by DLL A

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

Packing

• A techniques to wrap and compress the information employed by the executable so that theyare not statically visible

• From the perspective of the PE file structure, packing creates multiple anomalies in the structure of the Sections

• Packing is especially used in malware, as it conceals information about code and sections

• However, packing can also be detected, as each packer has its own characteristics (by usingprograms like PEId)

24

http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca

References and Tools

• M.Sikorski, A.Honig. Practical Malware Analysis, Chapter 1

• E.Eilam. Reversing: Secrets of Reverse Engineering, Wiley, Chapter 3 (Executable Formats)

• Hex Edit (https://www.softpedia.com/get/Programming/File-Editors/Hex-Edit.shtml)

• PEId (https://www.softpedia.com/get/Programming/Packers-Crypters-Protectors/PEiD-updated.shtml)

• CFF Explorer VIII (https://ntcore.com/?page_id=388)

• PEView (http://wjradburn.com/software/)

25