m10 debugging scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/user mode...heap...

71
Contents Overview 1 Lesson 1: Crash 2 Lesson 1: Review 8 Lesson 2: Heap Corruption 9 Demonstration: HeapCorrupt.exe 25 Lesson 2: Review 26 Lesson 3: Stack Corruption 27 Demonstration: StackCorruption.exe 29 Demonstration: StackOverflow.exe 30 Lesson 3: Review 32 Lesson 4: Hangs 33 Demonstration: Debug Hang with ADPlus/DebugDiag 41 Lesson 4: Review 42 Lesson 5: Memory Leak in Private Bytes 43 Demonstration: LeakDiag 55 Demonstration: Tracking a Leak in DebugDiag 56 Lesson 5: Review 61 Lesson 6: Special Debug Scenarios 62 Lesson 6: Review 69 Lab: Common Debug Scenarios 70 Debugging Scenarios

Upload: others

Post on 09-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Contents

Overview 1

Lesson 1: Crash 2

Lesson 1: Review 8

Lesson 2: Heap Corruption 9

Demonstration: HeapCorrupt.exe 25

Lesson 2: Review 26

Lesson 3: Stack Corruption 27

Demonstration: StackCorruption.exe 29

Demonstration: StackOverflow.exe 30

Lesson 3: Review 32

Lesson 4: Hangs 33

Demonstration: Debug Hang with ADPlus/DebugDiag 41

Lesson 4: Review 42

Lesson 5: Memory Leak in Private Bytes 43

Demonstration: LeakDiag 55

Demonstration: Tracking a Leak in DebugDiag 56

Lesson 5: Review 61

Lesson 6: Special Debug Scenarios 62

Lesson 6: Review 69

Lab: Common Debug Scenarios 70

Debugging Scenarios

Page 2: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, places or events is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. © 2002 Microsoft Corporation. All rights reserved. Microsoft, MS-DOS, Windows, Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Page 3: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 1

Overview

To insert the standard Overview slide, position the cursor within the following blue text, and then press F3.

Page 4: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

2 Debugging Scenarios

Lesson 1: Crash

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

The purpose of this lesson is to provide a brief overview of the behavior of crashes in applications. The tools required to identify the causes will be covered as well.

What You Will Learn After completing this lesson, the student will be able to:

• Explain what a crash is • Describe the behavior of different types of crashes • List the tools for troubleshooting crashes

Recommended Reading

• "Inside Microsoft Windows Internals, Covering Windows 2000, Windows XP, Windows Server 2003" by Mark E. Russinovich and David A. Solomon, ISBN 0-73561-917-4

• "Modern Operating Systems" by Andrew S. Tannenbaum, ISBN 0-13588-187-0

Page 5: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 3

Types of Crashes

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

When an application crashes it is an indication that an unhandled exception has occurred. This means that a First chance exception has gone unhandled and turned into a second chance exception.

There are several different categories of exceptions. One of the most common exceptions is the Access Violation. This can be the result of many different things and among them are Stack and heap Corruption.

What is an Access Violation An access violation (AV) happens when a call to an invalid space in memory is executed. This is often caused by a bad pointer passed to a function (Exp: Null pointer). Depending on how the application was written, this could be a first chance access violation, or a second chance Access Violation.

Types of Access Violations

A first chance AV happens when an invalid reference to memory is made. The system raises a first Chance Access violation and gives the application the chance to handle it. If the application is written with exception handling code, it will process the error. The behavior of the application will then be determined by what the error handling code decides to do. Typical behaviors are graceful termination of the process, displaying an error message, or writing an event into the event log.

A Second chance AV will happen when no error handling code is written in the application. The exception will be passed to the system to be processed. The system will then invoke the corresponding debugger set in the registry. The default debugger is usually Dr. Watson (for more information on this please refer to the module titled “Debuggers and Debugger Setup”). The behavior of this kind of error is discussed in the next slide.

For more information about Exception handling, please refer to the module titled “Understanding Structured Exception Handling”.

Page 6: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

4 Debugging Scenarios

There are several other types of exceptions. Some are other Structured Exceptions similar to an access violation.

There are also custom exceptions types for C++ and .NET.

Page 7: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 5

Tools for Troubleshooting Crashes

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

Tools for troubleshooting Crashes By default, Dr. Watson will be the debugger that will be called on a Second Chance Access Violation. If this is the case, the behavior and steps will be as follows.

When an application Access violates the system will call Dr. Watson. At this point, Dr. Watson will create a dump file for the process called user.dmp and will place it in the \windows directory. It will also append information about the error to the drwtsn32.log file. This information includes the name of the application that faulted, the modules loaded in it, and a list of stacks for all the threads executing in the application. Dr. Watson will also write an event to the event log giving the name of the application and the exception that occurred. To troubleshoot the problem, one could look at the drwtsn32.log files or the user.dmp file to find out what the offending component was. For more info on this please see the module titled “Post-Mortem Debugging”.

If Visual Studio is installed on the machine, then it will be the default debugger that is going to be used. When an application access violates, Visual studio is will display the Just-In-Time Debugger dialog.

The third option would be to use other debuggers to generate log files (Process Mode dump, CDB, Windbg, DebugDiag…). In most cases, you want to set the debugger up to generate a full dump file when an Access violation happens. You can then analyze the dump file to determine what the offending component was.

Page 8: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

6 Debugging Scenarios

Steps for Troubleshooting Crashes

Page 9: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 7

Demonstration: Monitoring with DebugDiag

Page 10: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

8 Debugging Scenarios

Lesson 1: Review

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

1. What are the symptoms of a second chance AV?

2. What tools are available for troubleshooting AVs?

Page 11: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 9

Lesson 2: Heap Corruption

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriae slide.

What You Will Learn • How to identify heap corruption • Types and causes of heap corruption • Tools for solving Heap Corruption

• Debug Heap • PageHeap

• Heap Debugging Commands

Recommended Reading • "Inside Microsoft Windows Internals, Covering Windows 2000, Windows

XP, Windows Server 2003" by Mark E. Russinovich and David A. Solomon, ISBN 0-73561-917-4

• How to use PageHeap on Windows XP and Widows 2000 – KB 286470

Page 12: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

10 Debugging Scenarios

Identifying Heap Corruption

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

How to Identify Heap Corruption Typically when an application fails due to a heap corruption error, DrWtsn32 or the default debugger installed will indicate that an access violation has occurred. An error message similar to the following will be displayed:

Application Error: The instruction at "0x77f82e8d" referenced memory at "0x3030303030". The memory could not be "written". Click on OK to terminate the application.

Most often this access violation will occur when an attempt is made to allocate memory. Here is an example of a typical call-stack involving heap corruption.

First chance exception c0000005 (Access Violation) occurred Thread stopped. > k 00b0fc00 6720764a NTDLL!RtlAllocateHeapSlowly+0x73b 00b0fcb0 77f028b3 NTDLL!RtlAllocateHeap+0xa7d 00b0fcfc 5f490aee KERNEL32!LocalAlloc+0x71

Another symptom of heap corruption is invalid data. This may manifest itself at various points in the application and occur randomly. However if the data that is getting corrupted is at a constant location it could be reproduced consistently.

When the program is launched under a debugger such as Windbg, or CDB, heap checking is enabled by default because the OS will enable some additional heap debugging features for the heap manager. Essentially, the heap becomes a debug heap. As a result, a heap check occurs when a free or alloc is performed. However, the heap corruption has already occurred and further debugging will need to be performed in order to isolate the cause of the heap corruption. A message similar to the following will be displayed by the debugger:

Page 13: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 11

IE: heap allocated at 0x001413e8 HEAP: Invalid Address specified to RtlFreeHeap( 140000, 1413e8 ) eax=001413e0 ... int 3

Also when running with these debugging features enabled the OS may raise a debug breakpoint (int 3) when corruption is detected before an Access Violation occurs.

Types and Causes of Heap Corruption Heap corruption may occur as a result of several types of programmatic errors involving heap processing. An application may allocate a block of heap memory and write beyond the end of the block. Another scenario may involve an attempt to free a block of heap that has already been freed. It is also possible to corrupt a heap location by writing to a location using data that is invalid.

• Writing beyond the end of a block of heap: This may occur if an application allocates a block of heap memory and writes to memory addresses beyond the end of the heap block. For example, if pszStr contains a 12 byte string, then 12 bytes will be allocated. However, the strcpy function will write 13 bytes into pszStr. The function of strcpy places a terminating NULL character which is not included in the value returned by the strlen function. Writing beyond the end of a block of heap is probably the most commonly observed example of heap corruption. The example code can easily be modified so that an additional byte is allocated.

Example: pszStr = malloc( strlen( "StringofText" )); strcpy( pszStr, "StringofText" );

The root problem is that the structures that the heap manager uses to keep track of the heap are integrated in amongst the data. If code writes past the end of an allocation that code is typically writing over the heap structures. When the heap manager attempts to use that data it will then fail or cause an access violation.

• Writing to a bad address (freed or invalid): There are two common causes of this type of heap corruption: • Freeing an address and not setting the pointer to NULL. Then some

code later reuses the data stored in the pointer which is now an invalid address. Sometimes the memory will have been reallocated so the code ends up reading invalid data or writing over valid data.

• Reading and address that is invalid and then attempting to use it. This can also lead to overwriting good that has a different purpose. Basically the bad code gets an address that it should not and since it is in use else where if the code writes to that address it will overwrite valid data.

• Attempt to free a block of heap twice: It is unlikely that a given routine will perform two subsequent free operations sequentially. However, another thread may logically result in a subsequent free. This is illustrated below. The "pStruct->Member = 0" statement must be executed following one free operation. In this example there are two bugs present, thread 2 should not

Page 14: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

12 Debugging Scenarios

free a heap block, most likely, allocated by the other thread, and thread 1 should not be attempting to access the structure after it was freed.

Example: Thread1 Thread2

pStruct = malloc( sizeof( *pStruct ));

free( pStruct );

free( pStruct );

pStruct->Member = 0;

Page 15: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 13

Heap Debugging Options

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

There are four tools that can help resolve heap corruption:

• Debug Heap

• PageHeap

• Break on Memory Access

• Add calls to HeapValidate

The Debug Heap and PageHeap features will be covered in more depth on the next 2 pages.

The trick to fixing heap corruption is locating what corrupted the data. When the Access Violation or other crash occurs due to bad data usually the instructions that invalidated the data are long since gone. The trick that we will discuss is how to make the problem write detectable. The Debug Heap and PageHeap are the most common but first we will take a look at 2 less common techniques.

Break on Memory Access

This technique would work in case where the corruption is consistent and easily reproducible. The idea is to track when data at an address is changed. The first question is what address is getting changed then you can attach WinDBG or CBD to the process before the problem occurs. Then set an access breakpoint on the problem address:

ba w4 <addr>

This command sets an access breakpoint that will break each time a write occurs anywhere in the 4 bytes after the specified address. Once the break is set up the application is run to reproduce the problem. Each time the breakpoint is encountered the call stack is dumped to see if the write is expected and if that write invalidated the data that is stored in the address.

Page 16: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

14 Debugging Scenarios

This approach and can alter the reproduction if it is timing related but then timing related problems are notoriously difficult to reproduce consistently in the first place.

Add calls to HeapValidate

This is another technique that might be leveraged to get closer the point when the corruption occurred. This involves adding calls to the API HeapValidate through out the application (especially in areas that do not allocate and free heap memory). HeapValidate walks the heap and ensures that all the structures in the heap are valid.

This is in no way an easy method and it requires the code to be altered and recompiled. Also, HeapValidate does not validate the data inside the allocations, but just the structures that the heap uses which surrounds the allocations.

Page 17: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 15

Debug Heap

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

When an application launches under a debugger such as Visual Studio, WinDBG, or CDB each heap that is created will automatically be a debug heap. The debug heap does some basic validation and tests the integrity of the heap. The default debug heap has three features enabled:

• Enable heap tail checking

• Enable heap free checking

• Enable heap parameter checking

This can be verified in the debugger but using the !heap command:

Page 18: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

16 Debugging Scenarios

0:000> !heap NtGlobalFlag enables following debugging aids for new heaps: tail checking free checking validate parameters Index Address Name Debugging options enabled 1: 000a0000 tail checking free checking validate parameters 2: 001a0000 tail checking free checking validate parameters 3: 001b0000 tail checking free checking validate parameters 0:000> !heap -p Active GlobalFlag bits: htc - Enable heap tail checking hfc - Enable heap free checking hpc - Enable heap parameter checking active heaps: - a0000 HEAP_GROWABLE HEAP_TAIL_CHECKING_ENABLED HEAP_FREE_CHECKING_ENABLED - 1a0000 HEAP_GROWABLE HEAP_TAIL_CHECKING_ENABLED HEAP_FREE_CHECKING_ENABLED HEAP_CLASS_1 - 1b0000 HEAP_TAIL_CHECKING_ENABLED HEAP_FREE_CHECKING_ENABLED HEAP_CLASS_8

This same functionality can be enabled with out the debugger by using the GFlags utility and the following command line:

gflags -i <image name> +htc +hpc +hfc

Or through the GFlags UI:

1. Launch GFlags

2. Select “Image File” Tab

3. Type the image name (ex. - notepad.exe)

4. Hit Tab

5. Select

a. Enable heap tail checking

b. Enable heap free checking

c. Enable heap parameter checking

6. Click Apply

Both of these options will insert the following registry key:

Page 19: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 17

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\notepad.exe] "GlobalFlag"=dword:00000070

There are a number of other heap-related global flags that enable additional functionality for the debug heap:

Default Debug Heap

#define FLG_HEAP_ENABLE_TAIL_CHECK 0x00000010 // +htc

#define FLG_HEAP_ENABLE_FREE_CHECK 0x00000020 // +hfc

#define FLG_HEAP_VALIDATE_PARAMETERS 0x00000040 // +hpc

Additional Debug Heap Options

#define FLG_HEAP_VALIDATE_ALL 0x00000080 // +hvc

#define FLG_HEAP_ENABLE_TAGGING 0x00000800 // +htg

#define FLG_USER_STACK_TRACE_DB 0x00001000 // +ust

#define FLG_HEAP_ENABLE_TAG_BY_DLL 0x00008000 // +htd

PageHeap

#define FLG_HEAP_PAGE_ALLOCS 0x02000000 // +hpa

Note that PageHeap (+hpa) can be combined with the other debug heap options, but all debug heap options will be ignored in favor of PageHeap so they will have no effect.

Figure: GFlags UI

Page 20: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

18 Debugging Scenarios

Reference for the debug heap settings:

Enable heap free checking Abbreviation hfc Hexadecimal value 0x20 Symbolic Name FLG_HEAP_ENABLE_FREE_CHECK Destination Systemwide registry entry, kernel mode, image file registry entry. Validates the heap when it is freed. See also: Enable heap tail checking, Enable heap parameter checking.

Enable heap parameter checking Abbreviation hpc Hexadecimal value 0x40 Symbolic Name FLG_HEAP_VALIDATE_PARAMETERS Destination Systemwide registry entry, kernel mode, image file registry entry. Verifies some aspects of the heap whenever a heap API is called. See also: Enable heap validation on call.

Enable heap tagging Abbreviation htg Hexadecimal value 0x800 Symbolic Name FLG_HEAP_ENABLE_TAGGING Destination Systemwide registry entry, kernel mode, image file registry entry. Assigns unique tags to heap allocations. You can display the tag by using the !heap debugger extension with the -t parameter. For information about the !heap extension, see the Microsoft Windows 2000 Debugging Tools Kit, which is available from the Microsoft Windows Driver Development Kits (DDK) Web site. See also: Enable heap tagging by DLL.

Enable heap tagging by DLL Abbreviation htd Hexadecimal value 0x8000 Symbolic Name FLG_HEAP_ENABLE_TAG_BY_DLL Destination Systemwide registry entry, kernel mode, image file registry entry. Assigns a unique tag to heap allocations created by the same DLL. You can display the tag by using the !heap debugger extension with the -t parameter. For information about the !heap extension, see the Microsoft Windows 2000 Debugging Tools Kit, which is available from the Microsoft Windows Driver Development Kits (DDK) Web site. See also: Enable heap tagging.

Page 21: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 19

Enable heap tail checking Abbreviation htc Hexadecimal value 0x10 Symbolic Name FLG_HEAP_ENABLE_TAIL_CHECK Destination Systemwide registry entry, kernel mode, image file registry entry. Checks for buffer overruns when the heap is freed. This flag adds a short pattern to the end of each allocation. The Windows heap manager detects the pattern when the block is freed and, if the block was modified, the heap manager breaks into the debugger. See also: Enable heap free checking, Enable heap parameter checking.

Enable heap validation on call Abbreviation hvc Hexadecimal value 0x80 Symbolic Name FLG_HEAP_VALIDATE_ALL Destination Systemwide registry entry, kernel mode, image file registry entry. Validates the entire heap each time a heap API is called. To avoid the high overhead resulting from this flag, use the HeapValidate() API at critical junctures, such as when the heap is destroyed. However, this flag is useful for detecting random corruption in a pool. See also: Enable heap parameter checking.

Enable page heap Abbreviation hpa Hexadecimal value 0x02000000 Symbolic Name FLG_HEAP_PAGE_ALLOCS Destination Systemwide registry entry, kernel mode, image file registry entry. Turns on page heap debugging, which verifies dynamic heap memory operations, including allocations and frees, and causes a debugger break when it detects a heap error. This option enables full page heap debugging when set for image files and standard page heap debugging when set in system registry or kernel mode.

• Full page heap debugging (for /i) places an inaccessible page at the end of an allocation.

• Standard page heap debugging (for /r or /k) examines allocations as they are freed.

Setting this flag for an image file is the same as typing gflags /p enable /full for the image file at the command line.

From - http://technet2.microsoft.com/WindowsServer/en/library/6a183942-57b1-45e0-8b4c-c546aa1b8c471033.mspx?mfr=true

Page 22: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

20 Debugging Scenarios

PageHeap

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

The Debug Heap provides additional validation and information about allocations. This can help get closer to the source of the problem but may not locate the root cause of the heap corruption. The next option that is available is Page Heap. Page Heap uses a different tactic. The first difference is that additional tagging information is added to each allocation so that the size and allocation information is known. Then when Full PageHeap is enabled a 4 KB guard page is placed at the end of each page to ensure that an Access Violation occurs immediately if a write past the end of the allocation occurs.

With the addition of the header and guard page the memory foot print of the application is greatly increased and the application will run slower. If an application is very memory intensive pageheap may cause the application to run out of memory and it may be necessary to reduce the amount of memory that is allocated on the pageheap. We will look at some of the options later but first we will take a deeper look at how pageheap works.

When the process creates a heap with pageheap enabled 2 heaps are created. The first is a normal NT Heap and the second is the pageheap. This allows pageheap to make some allocations in the normal heap if needed and others in the custom heap. Here is how that looks in the debugger:

Page 23: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 21

0:000> !heap Index Address Name Debugging options enabled 1: 001a0000 2: 003b0000 3: 003c0000 0:000> !heap -p Active GlobalFlag bits: hpa - Place heap allocations at ends of pages StackTraceDataBase @ 01020000 of size 01000000 with 00000006 traces PageHeap enabled with options: ENABLE_PAGE_HEAP COLLECT_STACK_TRACES active heaps: + a0000 ENABLE_PAGE_HEAP COLLECT_STACK_TRACES NormalHeap - 1a0000 HEAP_GROWABLE + 2b0000 ENABLE_PAGE_HEAP COLLECT_STACK_TRACES NormalHeap - 3b0000 HEAP_GROWABLE HEAP_CLASS_1

One of thing to notice is that the regular heaps do not have any debugging features enabled.

There are 2 flavors of PageHeap – normal and full. In the normal version of PageHeap allocations are tagged with a header and a fill pattern is added to the end of the allocation. Normal pageheap validates the allocation and the fill pattern and will fault if the pattern at the end of the allocation is not intact. The header that is added to each normal pageheap allocation is:

Page 24: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

22 Debugging Scenarios

+-----+---------------+--+ | | | | Light page heap allocated block +-----+---------------+--+ ^ ^ ^ | | 8 suffix bytes filled with 0xA0 | user allocation (filled with E0 if zeroing not | requested) block header (starts with 0xABCDAAAA and ends with 0xDCBAAAAA). A `dt DPH_BLOCK_INFORMATION' on header address followed by a `dds' on the StackTrace field gives the stacktrace of allocation. +-----+---------------+--+ | | | | Light page heap freed block +-----+---------------+--+ ^ ^ ^ | | 8 suffix bytes filled with 0xA0 | user allocation (filled with F0 bytes) block header (starts with 0xABCDAAA9 and ends with 0xDCBAAA9). A `dt DPH_BLOCK_INFORMATION' on header address followed by a `dds' on the StackTrace field gives the stacktrace of allocation.

When full pageheap is enabled the allocations are still tagged with a header and the fill pattern to even out the allocation. Then a no access guard page is placed after the application. The header that is added to each allocation is: +-----+---------+--+------

| | | | ... N/A page Full page heap

+-----+---------+--+------ allocated block

^ ^ ^

| | 0-7 suffix bytes filled with 0xD0

| user allocation (filled with C0 if zeroing not

requested)

block header (starts with 0xABCDBBBB and ends with 0xDCBABBBB).

A `dt DPH_BLOCK_INFORMATION' on header address followed by

a `dds' on the StackTrace field gives the stacktrace of

allocation.

+-----+---------+--+------

| | | | ... N/A page Full page heap

+-----+---------+--+------ freed block

^ ^ ^

| | 0-7 suffix bytes filled with 0xD0

| user allocation (filled with F0 bytes)

block header (starts with 0xABCDBBA and ends with 0xDCBABBBA).

A `dt DPH_BLOCK_INFORMATION' on header address followed by

a `dds' on the StackTrace field gives the stacktrace of

allocation.

Page 25: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 23

For additional information please see:

The Page Heap Block http://msdn2.microsoft.com/en-us/library/ms220938.aspx

Enable PageHeap To enable normal PageHeap use the following command line:

gflags /p /enable <image name>

To enable full PageHeap use the following command line:

gflags /p /enable <image name> /full

The gflags UI and +hpa command line option can be used to enable full pageheap only.

When full pageheap is enabled the following registry key is present:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\notepad.exe] "GlobalFlag"="0x02000000" "VerifierFlags"=dword:00000001 "PageHeapFlags"="0x3"

There are several options that can used the limit the impact of pageheap by altering which allocation go into the regular NT heap versus the pageheap. These options are documented in the pageheap command line:

gflags /p /?

Note

Page 26: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

24 Debugging Scenarios

Heap Debugging Commands

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

The following are heap-related commands that can be used to analyze the heap:

Dump out the GFlags setting used in the process: !gflag

Dump Page Heap: !heap -p

Dump Heap Handle: !heap –i heap address

Dump Heap Address: !heap –x address

Find Page Heap Containing Address: !heap –p –a address

Dump information about Page Heap: !heap –p –?

Page 27: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 25

Demonstration: HeapCorrupt.exe

Page 28: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

26 Debugging Scenarios

Lesson 2: Review

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

1. If using a debugger with heap tail checking enabled, when does a heap check occur?

2. What are 2 impacts on an application using PageHeap?

Page 29: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 27

Lesson 3: Stack Corruption

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

The purpose of this lesson is to provide a brief overview of Stack Corruption. The tools required to identify the causes of stack corruption will be utilized as well.

What You Will Learn After completing this lesson, the student will be able to:

• Identify stack corruption • Describe the causes of stack corruption • List the options for resolving stack corruption

Page 30: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

28 Debugging Scenarios

Identifying Stack Corruption

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

Application code may cause stack corruption due to a variety of programmatic errors. However, once the stack is corrupted, program execution will fail to continue. When analyzing a dump with stack corruption unwinding the stack directly may prove to be difficult if not impossible. An apparent access violation may have occurred and the offending program code may no longer be referred to or recognizable within the stack data structures. If an application access violation results, a message similar to the following may be displayed:

Application exception occurred: App: exe\application.dbg (pid=99) When: 9/9/99 @ 0:4:14.500 Exception number: c0000005 (access violation)

Page 31: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 29

Demonstration: StackCorruption.exe

Page 32: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

30 Debugging Scenarios

Demonstration: StackOverflow.exe

Page 33: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 31

Resolving Stack Corruption

To insert a new slide, position the cursor in the following paragraph. Then, on the menu, point to Insert Slide, and click the appropriate slide.

When analyzing a dump that demonstrates stack corruption the stack can be analyzed by looking at the Thread Environment Block (TEB) and then dumping the raw call stack. To dump the TEB use !teb and to dump the raw stack use “dds <address>”. The address to start with is the “Stack Limit” in the TEB. Stack Limit is the highest that we have been on stack so it may not be where we were just before the fault but is a place to start our analysis from if we cannot find the last valid frame on the stack.

Furthermore, it may be useful to utilize a checked build of the application in question and send the logging output to the debugger. This may demonstrate the operations taking place prior to the failure.

Some other tools that will help catch stack overruns are:

• Visual C++ -GS switch – This switch is most robust in Visual Studio 2005 and is used to compile ALL OS components.

• Safely handle strings or consider using safer libraries such as strsafe.h. http://www.microsoft.com/whdc/driver/tips/SafeString.mspx

In order to solve a stack corruption problem, custom code will often need to be created. It may be possible to use a tool such as NuMega Bounds Checker or some other means to capture a logic error in a program.

Additional code review may be required and detailed debug output prior to the failure may be required in order to trace the code in question and isolate the problem.

A final option if the corruption address is reproducible is to set an access breakpoint on that address as we did for heap corruption. The command would be:

ba w4 <address>

Page 34: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

32 Debugging Scenarios

Lesson 3: Review

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

1. What are two ways that a stack corruption may manifest itself?

2. List some of the options for resolving stack corruption.

Page 35: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 33

Lesson 4: Hangs

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation u, point to Insert Slide, and click the appropriate slide.

The purpose of this lesson is to give a general idea about a common debug problem known as a Hang. It will give an overview about the different types of hangs and what to look for in a dump file to identify the root cause.

What You Will Learn After completing this lesson, the student will be able to:

• Explain what process Hang is • Describe different causes of a hang • Explain a spinning thread application hang • Describe the causes of spinning thread application hang • List the tools and steps to troubleshoot a hang Recommended Reading

• "Inside Microsoft Windows Internals, Covering Windows 2000, Windows XP, Windows Server 2003" by Mark E. Russinovich and David A. Solomon, ISBN 0-73561-917-4

• “Debugging Applications” by John Robbins, ISBN 0-7356-0886-5 • "Modern Operating Systems" by Andrew S. Tannenbaum, ISBN

0135881870

Page 36: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

34 Debugging Scenarios

Hangs

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

What is Hang A hang is the situation where the application stops responding to user input. This could be a user interface application that should respond to console or mouse operations. Or it could be a service with no user interface that should response to client requests.

Types of hangs

Multiple reasons could lead into applications hanging.

• Deadlock happens when two entities are waiting on each other. In applications, this happens when one thread is waiting on a resource owned by a second thread. While the other thread in turn is waiting on a resource owned by the first thread. None of the threads will do anything before getting the resource, and none of them will get the resource, so the application hangs. This happens often with critical sections.

• Contention and serialization is another reason for application hangs. Those are usually found in multithreaded applications running certain services. Every thread usually will be handling a request, and all threads should be executing at the same time. However, all requests might end up needing the same resource and only one of them can have it at a given time. Consequently, all the requests to the resource are going to be serialized and the application will appear to be hung. This is usually detected by letting the application run for some time without allowing it to get new requests. After some time, the application will recover by itself. (unlike deadlock situation)

• Critical sections are most common in multithreaded applications and are heavily used to ensure data integrity. A thread would enter a critical section (block of code) to ensure that it is the only thread executing that piece of code. If another thread wants to enter the same critical section, then a wait for that critical section will be issued. When the first thread

Page 37: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 35

releases that critical section, the next thread in line waiting for it will get the lock on it.

• Spinning thread – This occurs when code executing on one or more threads is stuck in a loop or other code constantly executing with out making and progress.

Multiple problems can come from bad management of critical section. A thread might not call LeaveCriticalSection before exiting to free the critical section. So, every thread that is waiting on it will never get it and will hang the application. Another scenario could be that a thread called EnterCriticalSection twice, and Leave once, or visa versa. One other scenario could be an exception happening in a thread causing the thread to be terminated without leaving the critical section. All those symptoms might cause a critical section to be orphaned, never get released and cause the other threads to hang behind it.

Page 38: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

36 Debugging Scenarios

Tools for Troubleshooting Hangs

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

Tools for troubleshooting Hangs There are multiple tools available for troubleshooting Hang scenarios. The choice of the tool will be determined by user preference and the environment on which they are working in.

If you are in the process of developing the application, Visual Studio might be the best tool. Developers might be familiar with the Visual Studio interface more than other debuggers and will be able to switch between code and command views seamlessly. Also, no other tools will need to be installed on the development box.

On the other hand, you might not be able to do live debugging at the time the hang was happening. This could be because you do not have access to the machine, or you are not available at the time when the problem happened. In this situation, one solution could be to have someone generate several user mode dumps for the process at the time the problem happens, and then these could be analyzed at a later time to determine what the cause of the hang was.

To take the dumps you can use ADPlus:

adplus -hang -pn <process name> -o <output directory>

Or from the DebugDiag Process tab right click on the process and select “Create Full Userdump”.

Multiple dumps are necessary in determining if the process is completely hung, or just performing really, really slow. An example might be heap contention where every thread is trying to access the heap at the same time. This would be easily proved by using multiple dumps and seeing that the thread that owns the critical section accessing the heap keeps changing. Another way to determine if a process is completely hung is by using the !runaway command. Just run this commands in the different dumps and if there no change in the time then you know that no execution has taken place.

Page 39: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 37

If the process is completely dead locked this may be caused by Critical Sections and one way to easily analyze this is using the critlist command from the SieExtPub - !sieextpub.critlist. This command lists all of the critical sections that are being held or waited on and if there is a deadlock.

Steps to troubleshoot the problem will involve dumping all threads in the process. An analysis of all threads will have to be done and search for deadlocks or serialization usually follows. The debug engineer will need to have a good understanding of how the application works and what should be the behavior of every thread.

Page 40: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

38 Debugging Scenarios

Spinning Thread

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

Characteristics of a Spinning Thread Application Hang A variety of symptoms may result if a thread within an application is spinning. The application may stop responding. The Windows NT system may appear to hang if the spinning thread is in a tight loop. CPU utilization for the application may approach 100%.

A system with more than one processor will not be affected as significantly as a single-processor system. However, an application with multiple threads could hang an SMP system. A program can set "THREAD_PRIORITY_TIME_CRITICAL", a high user mode thread priority.

Causes of Spinning Thread Application Hang A thread may continuously spin within an application if proper checking is not performed and a tight looping condition results. If a multithreaded application is improperly designed, contention problems may result. For example, critical sections can be utilized with timeouts. If a timeout occurs, it must be handled.

If a thread spins tightly at a high priority, a system from a user standpoint may not be responsive enough to obtain more than a small percentage of the CPU time. However, the user can log off and/or shutdown the system, thus ending a spinning thread process. However, ending a process directly with a Windows NT tool such as PVIEWER may be quite difficult due to the lack of available CPU time.

Page 41: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 39

Tools for Isolating a Spinning Thread

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

Performance Monitor Perfmon can be utilized in order to observe CPU utilization of the system and applications indicating that a spinning thread may be active.

Start Perfmon on the target computer or run it remotely against the computer. Add the following counters:

• Processor - %Processor Time • Process - %Processor Time (Instance == Process) • Thread - %Processor Time (Instance == All Process Threads) • Thread – ID Thread (Instance == All Process Threads)

Logging the data to a file for the Processor and Process objects will be helpful for later examination. The application in question or scenario under which a high CPU utilization may be observed must be initiated. The %Processor Time and %User Time will be found to be high if application processing is responsible for the CPU Utilization. Also, a particular process will be able to be shown to be consuming a significant portion of the %User Time.

Page 42: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

40 Debugging Scenarios

Figure 4: Spinning Thread

Debugging In order to isolate the thread in question and determine the cause of the spinning thread, debugging must be performed. The debugger must be attached to the process in question.

Steps to troubleshoot: 1. Get Perfmon for objects mentioned above

2. Get dump file of spinning process

3. Find what thread, in the process, is taking all CPU cycles

4. Analyze what thread is doing in Dump file

High CPU may result from multiple threads each taking a good percentage of CPU time.

Caution

Page 43: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 41

Demonstration: Debug Hang with ADPlus/DebugDiag

Page 44: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

42 Debugging Scenarios

Lesson 4: Review

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

1. What are 3 symptoms of a spinning thread?

2. What are the 2 causes of spinning threads that were mentioned?

3. What 2 tools can help isolate a spinning thread?

Page 45: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 43

Lesson 5: Memory Leak in Private Bytes

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click e appropriate slide.

The purpose of this lesson is to provide a brief overview of the problem of private byte memory leaking in applications. The tools and debugging methods used to identify the causes of Application memory leaks will be utilized as well.

What You Will Learn • Explain private byte memory leaks • Causes of private byte memory leaks • Tools for isolating private byte memory leaks Recommended Reading • How to Use Display Heap (DH.EXE) Resource Kit Utility [winnt] ID:

Q168609 • Umdhtools.exe: How to Use Umdh.exe to Find Memory Leaks [winnt]ID:

Q268343

Page 46: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

44 Debugging Scenarios

Memory Leak in Private Bytes

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

Private Byte Memory Leaks An application private byte leak is caused by allocation of private memory and the failure to free it by the application. You can leak resources such as memory, user and GDI objects, handles, threads, etc. Leaks are a problem because system resources are limited, and when the system becomes low on resources, it can impact not only the leaking process, but other process or the entire system.

Symptoms There are many symptoms of a private byte leak. The system may slow down, applications may fail to operate properly. Many random symptoms may occur as a result of a low memory situation brought about by the leak. There may be a lot of memory left on the system but memory management functions may slow down due to delays in heap manager memory processing.

Potential Symptoms: • A popup message stating: "System out of virtual memory" may be

displayed.

• If a Windows NT system begins to exhaust available virtual memory resources, applications may exhibit random problems. Graphics requiring resources may not be displayed properly, delays may occur and the program may terminate. Results are dependent upon many factors including the error handling capabilities of the application.

• System performance may degrade and excessive swapping may occur.

Causes of Private Byte Memory Leaks • A private byte memory leak is caused by an application or process that

allocates memory for use but does not later free the memory when the application has finished using it. The result is that available memory is continually utilized and may be exhausted over a period of time, often causing the system to stop functioning properly

Page 47: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 45

Tools for Isolating Private Byte Memory Leaks

To insert a new slide, position the cursor in the following paragraph. Then, on the Prelide.

In the next several slides we will look at LeakDiag and DebugDiag’s LeakTrack.

Performance Monitor Perfmon can be utilized in order to indicate that a leak is occurring.

Start Perfmon on the target computer or run it remotely against the computer. Add the following counters:

Memory - Pool NonPaged Bytes

Memory - PoolPaged Bytes

Memory - Cache Bytes

PageFile - Usage %

Process - Private Bytes

Logging the data to a file for the Memory and Process objects will be helpful for later examination. The application in question or scenario under which a leak may be observed must be initiated. Over time, leaks can be observed and the symptoms will include upward trends in the resource in question. For example, a private byte memory leak in an application will result in increasing private byte memory utilization credited to the process in question. Memory resources including Pool Paged Bytes will increase as well.

The process is often slow and may take a long period of time to detect. Starting your test will cause the counters to jump. It may take a while for the memory pools to reach a steady state. If a paged pool leak is occurring, both Memory - PoolPaged Bytes and PageFile - Usage % will increase over time.

Page 48: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

46 Debugging Scenarios

Figure 5: Memory Leak

UMDH User Mode Dump Heap is a tool which allows you to track memory allocations that have not been freed. The tool works in conjunction with GFLAGS settings that allow it to save allocated memory stacks to a file. Once you have multiple runs of UMDH files saved you can use it to compare files.

A typical usage scenario would be to run UMDH shortly after your application has started, this allows for any caching to be completed. Then run it again and again (at intervals) until memory has been consumed (leaked). With these files you can use UMDH to create a compare file, which contains only the stacks that have been allocated and not freed since the first file was created. So you might compare file 1 and 2, and then files 2 and 3, and so on. This will allow you to find out what stacks are leaking memory.

For more detailed information on using UMDH see Q248343, “Umdhtools.exe: How to Use Umdh.exe to Find Memory Leaks”

Note

Page 49: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 47

Leakdiag

LeakDiag is a powerful, easy to use set of utilities designed to help developers, support professionals, and IT personnel in diagnosing memory leaks, fragmentation, and other memory related issues in applications or services running on Windows 2000, Windows XP and Windows Server 2003.

These tools can be used to pinpoint memory leaks down to the line of code. Using Microsoft's Detours® technology, LeakDiag intercepts calls to specified memory allocators and tracks the various call stacks. It reports on the memory that has been allocated but not yet freed. This information allows a person troubleshooting a memory leak problem to see exactly what components made the allocations. With proper debug symbols, even the line of code that requested the allocations can be seen.

Administrator rights are required to install and use LeakDiag on a computer. To effectively use the dbghelp StackWalk API option, correct symbols are required. Symbols are also required for log file generation if the ResolveSymbolFlags is set to 1.

System Requirements The Microsoft Memory Diagnostics Toolkit can be installed on the following platforms:

Windows 2000 Professional, Server, Advanced Server and Datacenter

Windows XP

Windows Server 2003

LeakDiag requires about 2 MB of hard drive space for installation. Additional space is needed for log files that are generated. If using the fragmentation tracking options, a large amount of disk space may be needed to hold the log files. The log file location can be different than the installation folder.

Page 50: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

48 Debugging Scenarios

The LeakDiag application window displays a list of the currently running applications that you can monitor for leaks. This includes Services and MTS/COM+ applications. For operating systems prior to Windows XP the list will only show applications running in the current Terminal Server session.

First choose an application from the list of running processes. If the application was started after LeakDiag was launched, you can refresh the list by choosing "Refresh" from the View menu. After selecting an application you will be able to select an allocator to track from the "Memory allocators" list. Select the desired allocator to track and click the Start button. LeakDiag will then inject into the process and begin tracking memory allocations made through the selected allocator.

Page 51: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 49

Which Allocator to Use?

The Virtual Allocator is used to track:

NtAllocateVirtualMemory

NtFreeVirtualMemory

These functions are called through the external APIs VirtualAlloc(Ex) and VirtualFree(Ex). If you suspect that there is a leak in virtual memory, or to help confirm excessive virtual memory usage by a heap manager, and then track this allocator. When dealing with an application that uses the NT Heap Manager (HeapAlloc...) or the C Runtime allocators (malloc, new, etc...), then it is common to see the virtual memory for the application be as much as twice the private bytes. However, these would grow at relatively the same rate. If, however, you notice that private bytes level off and virtual memory continues to grow, then there may be a leak in virtual memory, or some sort of heap fragmentation.

The NT Heap Allocator is used to track the following calls in NTDLL.DLL:

RtlCreateHeap

RtlDestroyHeap

RtlAllocateHeap

RtlFreeHeap

RtlReAllocateHeap

LocalReAlloc

LocalFree

LocalAlloc

LocalReAlloc

GlobalAlloc

Page 52: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

50 Debugging Scenarios

GlobalReAlloc

GlobalFree

The Com Allocator is used to track the following calls from OLE32.DLL and OLEAUT32.DLL:

CoGetMalloc

CoTaskMemAlloc

CoTaskMemFree

CoTaskMemRealloc

CRetailMalloc_Alloc

CRetailMalloc_Free

CRetailMalloc_Realloc

SysAllocStringLen

SysAllocStringByteLen

SysAllocString

SysFreeString

SysReAllocString

SysReAllocStringLen

The C Runtime Allocator tracks the following calls from MSVCRT.DLL:

malloc

calloc

realloc

free

new

new[]

delete

delete[]

The TLS Allocator tracks the following calls in KERNEL32.DLL:

TLSAlloc

TLSFree

Page 53: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 51

The LeakDiag Log File

Header The LEAKS element displays the version of LeakDiag that the log was generated with. This would match the version of inject.dll.

<LEAKS ver="1.25.28.2002">

The Stack The STACK element contains the data for each stack that has allocated memory, and that memory has not yet been freed. In other words, a STACK will only be in the log file if it has allocated some memory, and that memory is still in use. This does not mean that every STACK is a leak, only that it has allocated memory, and that memory is not freed yet.

The STACK structure contains a statistics section, and then a call stack section. The attributes of the STACK element are numallocs, size, and totalsize. The numallocs attribute refers to the number of allocations that this stack currently has outstanding. In our example stack, that means that there are 232296 allocations that have not been freed yet. The size attribute is an average size of each allocation. It is derived by dividing totalsize by numallocs. Specific allocation sizes are listed in the STACKSTATS section. The totalsize attribute refers to the total amount of bytes this stack has outstanding. In our example that amount is 27.5MB. So, in the STACK bellow, there were 232296 allocations totaling 27.5MB that have not yet been freed.

The STACKSTATS section contains a breakdown of the sizes and heaps that the allocations were made out of. The SIZESTAT element shows an explicit size and the number of times that size was allocated by this stack. The HEAPSTAT section shows an explicit heap handle, and the number of allocations from that heap by this stack. So, in the example above, this stack had 3 distinct allocations:

• 2216 allocations of 112 bytes

• 1 allocation of 3584 bytes

Page 54: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

52 Debugging Scenarios

• 1 allocation of 2064 bytes

232296 of those allocations came from the heap with the handle 860000.

The FRAME elements make up the call stack back trace. This is the list of return addresses that made the allocations. The FRAME element has a num, dll, function, offset, filename, line, and addr attributes. The num attribute is simply a FRAME number. It is 0 based. The dll attribute is module name, in this case MemTester.exe. The function attribute is name of the function, if that was available at the time the log was generated. This name is derived from symbols, either in the module itself, or in a symbol file. See the Using Symbols topic for more information. The offset attribute is the number of bytes into the function from the start address. This is also derived from symbolic information and is only as accurate as that information. The filename attribute is the source code for the function, if that information is available. This is also from symbolic information. The line attribute is the line number in the source code. The addr attribute is the explicit return address of the function.

The STACKID element is a unique identifier for each stack. It can used to compare multiple logs from the same instance of an application to observe trends.

The Footer The FOOTER section contains basic information about the process and machine. You can find the following information in the FOOTER.

Filename and path of the application being monitored

Process ID of the application being monitored

OS version of the machine

Memory information logged by LeakDiag

Page 55: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 53

LDParser The Allocation Size Details window shows the allocations grouped by size along with the total number of allocations of that size.

The Stacks window shows the different call stacks that were tracked as allocating, but not freeing memory. Clicking on each line will change the Frame Details window. This window can be sorted on each column by clicking on the column name.

The Frame Details window will show you the frames for the currently selected stack. The details include the function name, filename, line number and address. Of course, you only get this information if you had a good symbol path set in LeakDiag.

Page 56: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

54 Debugging Scenarios

LDGrapher 1. Use LeakDiag to take multiple log files of the leaking process. This

utility requires at least two such log files. You can configure LeakDiag to take snapshot of the memory at regular intervals, say every 10 minutes or every 1 hour. Usually running the leaking application for a long time and taking 5 or 6 snapshots will help to better understand the memory leak.

2. Once the files are created, click on "File->Open Files" and choose those log files. This will display the leaking stacks with the Y axis showing Num Of Allocations.

3. Choose the View menu options to switch between Num Of Allocations and Total Size.

4. Move the mouse over the lines to see the values in the status bar.

5. Click on the line or stack name to see the call stack. This should pop up a new window with the callstack. The window title shows the growth and the values of that stack.

Page 57: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 55

Demonstration: LeakDiag

Page 58: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

56 Debugging Scenarios

Demonstration: Tracking a Leak in DebugDiag

Page 59: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 57

Memory Fragmentation

Memory fragmentation is exactly the same as disk fragmentation. We get a new PC ( or process address space) start loading in programs, and the like, and generally using and releasing parts of the disk ( our virtual address space) and the next thing we know, we don’t have room for yet another 1.5Gb application.

It’s exactly the same with memory. We get virtual address space fragmentation. This is usually caused by a mixture of allocation patterns, i.e. mixing large and small allocations which have both long lived small allocations, with lots of short lived large allocations, and we pretty soon fragment out virtual address space horrendously.

A more likely scenario, is that we will be hammering on a particular heap manager with the above allocation pattern, and it’s the heap that causes the VA fragmentation, as it tries to gather more and more memory as it has to keep expanding until we get to the point where the heap manager cannot allocate another large block, while still having lots of free memory.

Virtual Address Space fragmentation We encounter this if we make excessive use of the VirtualAlloc APIs, and again mix large with small short and long term allocations. We will slowly move thru our whole 2 GB VA space turning it into Swiss cheese.

Heap Internal Fragmentation If we look back to how the Heap manager actually does its front end allocation, it typically uses fixed size buckets to satisfy all allocate requests (up to a max size).

So if we ask for one bit, we actually get that from the 8 byte bucket. If we ask for 9 byes, we get an allocation from the 16 byte sized bucket. You can see how this is wasteful of memory, and if we requested n 1Bytes allocations, without taking into account any of the overhead for the heap manager, we would run out of memory when we had allocated only 1/8th of the total 2 GB virtual address space.

Page 60: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

58 Debugging Scenarios

Heap External fragmentation This is the same as VA fragmentation, except its caused by the heap growing across many different segments, and finally no longer being able to find a large enough free chunk to satisfy the current request.

The Stack doesn’t experience Fragmentation as the stack isn’t a random access mechanism like the Heap, or Virtual memory, it’s a LIFO (Last In First Out) data structure.

Customer Heap Managers Sometimes the default Windows Heap manager does not handle allocations in the most efficient way for your application allocation pattern. This is where a custom heap manager can help out. There are third party heap managers and even Microsoft ships other heap managers – ex. .NET Heap, CRuntime Small Block heap (not used by default anymore), Rockall heap developed by Microsoft Research, and Low Fragmentation Heap (LFH) introduced in Windows XP/2003.

Tool to analyze Virtual Address space VaView is tool from Microsoft Support that will produce a picture of the Virtual Address space based on a dump file. This is great for looking at how the address space is allocated and fragmented.

Also from a dump file the mdacextpub’s command !vmmap will also analyze the memory usage and display statistics.

Page 61: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 59

Demo: Memory Fragmentation

Memory fragmentation is pretty much exactly the same as disk fragmentation. It’s a little more subtle in some ways, as we deal with virtual memory, rather than physical clusters on the surface of the disk. Anyway, here is how the fragmentation game plays out.

We start of with our memory nice and empty. Lots of nice empty 4K pages all unallocated. (Ok, so this isn’t quite how it works out, as we have to VirtualAlloc them all first, but using a little artistic license here.)

Then we start to use memory, allocating large blocks, and small blocks.

Then we start to release some of that memory.

Then we need some more memory

Then we release some of that.

And so on and so on, until….

We have fragmented all our memory, and we cannot find enough space for the next allocation

Page 62: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

60 Debugging Scenarios

Fragmentation (Continued)

How do we know when we are suffering from fragmentation?

The early signs are when the ratio between private Bytes and Virtual Bytes (VA Eff = P Bytes / V Bytes) starts to move from its ideal of 1.

However if this is fairly static, and the value of V Bytes is not approaching the 2 GB limit, this may not be a problem.

Initially the consequence of fragmentation is degradation in performance over time. This can manifest itself as a server process slowing down over a period of days or weeks, when presented with the same load.

The best way to detect this is to take a series of dumps during a particularly long running function, and look for locking on the Heap functions.

If the fragmentation continues, it can result in severe performance degradation, or can ultimately lead to memory exhaustion, leading to out of memory error conditions.

One of the best ways to fragment our memory is to mix long lived small allocations with short term large allocations. This is a sure fire way to fragment your memory.

How do we solve this? Change your allocation pattern, or preferably, split the short term and long term allocations across different heaps.

Alternatively, if the root cause is from a memory leak, then fix the leak, or reduce it to a sustainable level.

Page 63: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 61

Lesson 5: Review

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

1. What are the symptoms of a private byte memory leak?

2. What tools are available in order to isolate the source of a memory leak?

Page 64: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

62 Debugging Scenarios

Lesson 6: Special Debug Scenarios

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

Page 65: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 63

Debugging Process Startup

The process described in this section is most useful with debugging service startups. If a service crashes on startup, you might not have enough time to attach the debugger to it between the time that it tries to start and the time that it crashes. Furthermore, you might not be able to start the service from the debugger by pointing to the executable since the Service Control Manager needs to start the service. The ideal way to debug such a service would be to have the OS automatically start a debugger and start the service from the debugger every time the process hosting the service is going to start.

This is actually possible by including a “Debugger” value under the “Image File Execution Options” key in the registry. This can be done manually, or by using the gflags.exe utility included in the debuggers package. On process startup, the OS will look at the “image File Execution Options” key and see if there are any entries for the process that it is about to start. If there is a Debugger value present, the OS will start the debugger first, then start the service process from the debugger automatically. This will happen every time the process is going to be started.

As an example, let’s say that the IISAdmin service crashes on startup. This is the service responsible for managing the IIS services such as the World Wide Web service. This service and the other services run under the process called inetinfo.exe. The only way to start the process is to start the IIS Admin service and there is no way of doing this in the debugger. The way to start this process from the debugger is to set the following registry key:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\inetinfo.exe]

"Debugger"="C:\\Debuggers\\windbg.exe -g“

This assumes that you have the Microsoft Debuggers package installed into the c:\debuggers directory and that you want to use Windbg as your debugger.

Page 66: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

64 Debugging Scenarios

Another way to do it is to use gflags.exe and set the “Debugger” option for the inetinfo.exe process. This will automatically set the above key in the registry.

The “-g” is included so that the debugger will not break into the process as soon as it starts. If the “-g” is omitted, the debugger will pause the process until you choose to “go” the process. This is a little tricky and is time sensitive. If the service usually starts, but you leave the debugger pausing the process for an extended period of time, the SCM will timeout and kill the service and the debugger. However, if you choose to “go” the debugger and the service sends a message to SCM telling it that it started OK, and then you can pause the service for as long as you want without any problems.

When debugging the startup a Windows Service it is important to adjust the ServicesPipeTimeout. When the Service Control Manager (SCM) launches a service if that service does not start in the window specified in ServicesPipeTimeout the SCM will attempt to kill the process. The value is just a DWORD in the following location:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control

For additional information see KB – 824344 – How to debug a Widows Service.

One thing you have to be careful about while dealing with services. Usually services run in the background and do not require user interaction. Thus, by default services are not set to interact with desktop. So, when you do the above and start the service, the debugger will start but will not be visible to the user. If an exception happens, the debugger will break in by default hanging the service, but the user will not be able to

Page 67: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 65

see anything. To resolve this, you will have to set the service to “interact with the desktop” before the service is started.

To do this, open the services MMC and find the service that you need to work with. Right click on the service name and choose properties. Under the “Log On” tab on the dialog box, check the “Allow service to interact with desktop” checkbox. OK your way out.

The next time you try to start the service; the debugger will start and start the service. You can break in at any time and perform a normal live debug.

Page 68: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

66 Debugging Scenarios

Debugging via Terminal Services

In many situations, you find yourself needing to perform a live debug on a machine without having physical access to it. You can terminal server to it, and connect to it through a network, but the console is out of your reach. This can be due to the fact that the box is in a physically secure location, or in a remote location and you don’t feel like flying there. If you need to debug a user application that you start yourself, such as notepad for example, you can start the debugger and attach to that process that you started without running into any problems. However, let’s say that you want to attach to a process running a service that you are trying to debug. You start the debugger and through the menu, you try to attach to the process running the service. To your surprise, you get the following error message:

Could not attach to process “process ID”, Win32 error 5. Access is Denied.

On Windows 2000 and Windows XP, you get this message if the process you are attaching to is running in a different Winstation than the one you are logged on to. The error message is a little misleading and might send you checking for privileges in your security settings. This will not be an issue in Windows Server 2003 since debugging through Winstations will be allowed.

Page 69: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 67

Setting up Debugging via Terminal Server

To workaround this problem, we will need a way to get the debugger started under the default Winstation so that it is able to attach to processes started under it. To do this, we will do the following steps:

1. Use the “AT” Task Scheduler to launch “cdb.exe” with a remote pipe and attach to the process sometime in the near future (ie: 1 minute). You can use a command similar to the following:

cdb.exe -server tcp:port=9999 –p <TID> -g

2. Connect to the remote pipe with CDB or Windbg from the TS session or a remote machine.

The Task Scheduler runs as the Local System account and therefore will be running in the default Winstation.

Another way to work around the Winstation limitation is to start the debugger from a TELNET window. In order to do that, first you have to start the Telnet service on the target machine since it is not started by default.

Then, you can telnet to that machine and use CDB to attach to any process running the service that you need to debug. You can choose to debug using CDB from your telnet window, or if you prefer the UI version, remote out the CDB debugger using the .server command, and connect to it using WinDBG from any other machine.

Page 70: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

68 Debugging Scenarios

Lesson 6: Review

To insert a new slide, position the cursor in the following paragraph. Then, on the Presentation menu, point to Insert Slide, and click the appropriate slide.

1. What tool is used to configure an application to start under a debugger?

2. When debugging a service startup, what other tasks must be completed in order to debug?

3. What are two techniques that can be used to debug a process via Terminal Services in Windows 2000?

Page 71: M10 Debugging Scenarios - read.pudn.comread.pudn.com/downloads181/ebook/846936/User Mode...Heap corruption may occur as a result of several types of programmatic errors involving heap

Debugging Scenarios 69

Lab: Common Debug Scenarios

To insert the standard Lab slide, position the cursor within the following blue text, and then press F3