intel processor trace - what are recorded?
DESCRIPTION
Intel Processor Trace (or Intel PT) is an processor extension for IA64 and IA32. The extension captures how a program got executed in machine-instruction level. All dynamic events, such as, branches, calls and interrupts, are recorded. This allows perfect reconstruction of previous execution by a trace analyzer. This slide summarizes which data is generated out from this extension.TRANSCRIPT
Processor TraceWHAT ARE RECORDED?
Pipat Methavanitpong+PipatMethavanitpong@fulcronz27
Foreword
This work is done solely by myself without support from Intel Information in this document is derived from
IA64 System Programming Manual – Chapter 36 Some are from my understanding Mistakes or wrong information may appear I am willing to update and correct errata
Please contact me via Google Hangout I am not responsible for damage using this document
Objective
Give summary of data generated from Intel PT
Include relationships between data types partially Not include its mechanism and controlling
PT Overview
Machine instruction-level tracing Use dedicated hardware to trace
Convention uses software to trace software Bird eye view observation
Can fully reconstruct execution at Analyze time Record events that cannot be refer solely from binary Usage
Low-level debugging Fine tuning performance State recovery
Background
At the lowest level programs are chunks of machine instructions Processor executes machine instructions in sequential fashion Processor does not execute in sequence when
Executing a redirecting machine instruction Handling an interrupt or an exception (asynchronous event) …
Execution context may be changed Changing execution mode Page switching …
Pros and Cons
Pros Finest grain in software tracing
Machine instruction level
Cons Design overhead
Additional hardware Man-picked dynamic events
May miss some categories Hard to change
Hardware implementation
*My own opinion
Packet Types
1. Packet Stream Boundary – Interval beats, Sync point for analyzer2. Taken Not-Taken – Conditional branch decision3. Target IP – Target address within program binary4. Flow Update Packets – Target address outside program binary (async events)5. Paging Information Packet – Modification to CR3 task page base address6. Time-Stamp Counter – Wall clock data7. MODE – Execution mode8. Core Bus Ratio – Bus clock ratio9. Overflow – Internal buffer overflow10. PAD – Alignment purpose
Packet Summary
Processor Trace Packets
Redirection
Inside traced program
TNT TIP
Outside traced
program
FUP
Environment
Execution
PIP MODE CBR
Trace
PSB OVF
Misc
Alignment
PAD
Time
TSC
*does not imply packet combination
Taken Not-Taken (TNT)
A group of binary decisions 2 types of event
Conditional branch Taken(1) / Not taken(0)
Unmodified return address Taken(1)
2 sizes Short TNT – 8-bit packet contains 6 decision bits Long TNT – 64-bit packet contains 47 decision bits
No need to fill all the bits Partial TNT when generates other packets in the middle
Decision
Taken (1) Not Taken (0)
Target IP (TIP)
A destination address within traced program Used for
Indirect jump / call – use an address from a register or memory Modified return address – return address on a call stack is modified
Has different packet signature from FUP Has 2 extra variants
TIP.PGE – Packet Generation Enable TIP.PGD – Packet Generation Disable
Flow Update Packet (FUP)
A destination address outside a traced program Generated when asynchronous event happens
External interrupts Exceptions and faults X instructions #SMI WRMSR that clears TraceEn (one of flags that control tracing operation)
Generated in combination with other packets (not talked here) Has different packet signature from TIP
Page Information Packet (PIP)
Keep track of page information Current linear address range
CR3 register contains task’s page base address Generated when CR3 is modified
Has exceptional cases
MODE packet
Record of processor modes that affect Execution behavior Analyze operation
2 modes are recorded Execution modes
16- / 32- / 64-bit TSX transaction operations
Begin / commit / abort Either HLE or RTM
Core Bus Ratio (CBR) Packet
Tells current core:bus ratio Cannot tell CBR change starts affecting which IP Generated when
CBR changes As a part of PSB+
Packet Stream Boundary (PSB)
Generated every 4k traces Like heartbeats for trace operation
Analyzer searches for this packet first to start decoding PSB itself does not contains any information
Just pure binary signature Generated in combination with other packets
A whole pack is called PSB+ Tells current execution environment
Overflow (OVF) Packet
Generated when PT happens to overflow its internal buffer
Analyzer skips to next FUP or TIP.PGE
PAD
Simply padding No information contained Improve packet-alignment
Or some implementation reasons
Time-Stamp Counter (TSC)
Give wall clock time Included in PSB+ Precedes CBR packet
THE END