the e820 trap of linux kernel hibernation

Download The e820 trap of Linux kernel hibernation

If you can't read please download the document

Upload: joeylikernel

Post on 13-Apr-2017

839 views

Category:

Software


1 download

TRANSCRIPT

The e820 trap of Linux kernel hibernation
Aug, 2015, COSCUP 2015, Taipei

Joey Lee, SUSE Labs Taipei

Agenda

FundamentalHibernation (suspen to disk)

e820, EFI memmap

e820 shiftPlatform vs. Shutdown

memory size changing

EFI memmap shiftsetup_data and nosave regions

EFI runtime services broken after S4

Challenges

Q&A

Fundamental

Memory (physical)

pfn = 0

pfn = max_pfn

Memory (runtime)

0

max_pfn

Hibernation (suspend to disk)

Create snapshot image of runtime memory.

Store snapshot image to swap partition or file.

Restore snapshot image to memory.

Hibernation (restore)

0

max_pfn

0

max_pfn

Memory restored

Memory (physical)

pfn = 0

pfn = max_pfn

Memory (BIOS memory map)

0

max_pfn

0

max_pfn

Boot

Boot

e820

Wikipedia: e820 is shorthand to refer to the facility by which the BIOS of x86-based computer systems reports the memory map to the operating system or boot loader.

It is accessed via the int 15h call, by setting the AX register to value E820 in hexadecimal. It reports which memory address ranges are usable and which are reserved for use by the BIOS.

e820 entry type

TypeKernel DefineString in dmesgDescription

Type 1E820_RAMusable,
System RAMUsable (normal) RAM

Type 2E820_RESERVEDreserved,
reservedReserved - unusable

Type 3E820_ACPIACPI data,ACPI TablesACPI reclaimable memory

Type 4E820_NVS*ACPI NVS,
ACPI Non-volatile StorageACPI NVS memory,
ACPI Non-Volatile-Sleeping Memory (NVS)

Type 5E820_UNUSABLEUnusable,Unusable memoryArea containing bad memory

* drivers/acpi/nvs.c::suspend_nvs_*() handle ACPI NVS for S4

Memory (BIOS memory map)

0

max_pfn

0

max_pfn

Boot

Boot

Memory (runtime)

0

max_pfn

0

max_pfn

Boot

ACPI NVSreservedACPI data

reservedBoot

useable

useable

useable

useable

useable

useable

0

max_pfn

Boot

ACPI NVSreservedACPI data

reserved

useable

useable

useable

useable

useable

useable

OS

EFI memory map

EFI spec v2.5EFI_BOOT_SERVICES.GetMemoryMap()Returns the current memory map.

6.2 Memory Allocation ServicesTable 25. Memory Type Usage before ExitBootServices()

Table 26. Memory Type Usage after ExitBootServices()

e820 entry type vs. EFI memory region type

E820 TypeE820 entry typeEFI memory region type

Type 1E820_RAMEFI_LOADER_CODE (type 1)EFI_LOADER_DATA (type 2)EFI_BOOT_SERVICES_CODE (type 3)EFI_BOOT_SERVICES_DATA (type 4)EFI_CONVENTIONAL_MEMORY (type 7)

Type 2E820_RESERVEDEFI_RESERVED_TYPE (type 0)EFI_RUNTIME_SERVICES_CODE (type 5)EFI_RUNTIME_SERVICES_DATA (type 6)EFI_MEMORY_MAPPED_IO (type 11)EFI_MEMORY_MAPPED_IO_PORT_SPACE (type 12)EFI_PAL_CODE (type 13)

Type 3E820_ACPIEFI_ACPI_RECLAIM_MEMORY (type 9)

Type 4E820_NVSEFI_ACPI_MEMORY_NVS (type 10)

Type 5E820_UNUSABLEEFI_UNUSABLE_MEMORY (type 8)

New*E820_PMEMEFI_PERSISTENT_MEMORY (type 14)

* v4.2-rc4
arch/x86/boot/compressed/eboot.c::setup_e820()

e820 shift

e820 shift (1)

Boot 1:

Boot 2:

e820 shift (2)

Boot:[ 0.000000] BIOS-e820: [mem 0x0000000068f45000-0x0000000069d4ffff] usable

Resume Boot:[ 0.000000] BIOS-e820: [mem 0x0000000069d4f000-0x0000000069e12fff] reserved

[ 0.000000] PM: Registered nosave memory: [mem 0x69d4f000-0x69e12fff]

[ 17.410733] PM: Image loading progress: 0%

[ 17.929495] BUG: unable to handle kernel paging request at ffff880069d4f000

[ 17.933469] IP: [] load_image_lzo+0x810/0xe40

Page fault address is in usable memory entry when boot, but in reserved memory entry when resume boot.

e820 shift (3)

0

max_pfn

Boot

ACPI NVSreservedACPI data

reserved

useable

useable

useable

useable

useable

useable

max_pfn

Boot

ACPI NVSreservedACPI data

reserved

useable

useable

useable

useable

useable

useable

0

Boot

Resume Boot

Useable address
in reserved region

Checking e820 shift:

Lee, Chun-Yi [PATCH] PM / hibernate: avoid unsafe pages in e820 reserved regions:84c91b7ae commit in v3.17-rc1https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=84c91b7ae07c62cf6dee7fde3277f4be21331f85

Reverted by f82daee49 commit in v4.0Waiting Yinghai Lu [PATCH]x86: Kill E820_RESERVED_KERN

Lee, Chun-Yi [PATCH] Hibernate: save e820 table to snapshot header for comparisonhttps://lkml.org/lkml/2014/8/11/166

Platform vs. Shutdown (1)

Different modes of hibernation:cat /sys/power/disk [platform] shutdown reboot suspend

Platform mode depends on \_S4 support by BIOS:[ 1.080004] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S4_] (20130725/hwxface-571)

ACPI spec 6.0: Table 7-234 BIOS-Supplied Control Methods for System-Level Functions\_S4: Package that defines system \_S4 state mode.

16.3.2 BIOS Initialization of Memory (since ACPI v1.0):Note: The memory information returned from the system address map reporting interfaces should be the same before and after an S4 sleep.OSPM will invoke E820 interfaces on IA-PC-based legacy systems or the GetMemoryMap() interface on UEFI-enabled systems

Platform vs. Shutdown (2)

Documentation/power/swsusp.txt in kernelQ: What is the difference between "platform" and "shutdown"?

A: "platform" is actually right thing to do where supported, but"shutdown" is most reliable (except on ACPI systems).

Linux Kernel bug #77571:https://bugzilla.kernel.org/show_bug.cgi?id=77571

The same page fault when writing snapshot image to page buffer.

Bug reporter uses shutdown but not platform.After using platform, bug reporter can not reproduce issue.

That's better using platform when BIOS support \_S4. User should aware that has risk when using shutdown.

Memory size mismatch (1)

PM: Loading and decompressing image data (495448 pages)...[ 3.834831] PM: Image mismatch: memory size[ 3.834851] PM: Read 1981792 kbytes in 0.01 seconds (198179.20 MB/s)[ 3.836147] PM: Error -1 resuming[ 3.836162] PM: Failed to load hibernation image, recovering.

Normally: On node 0 totalpages: 4177255When issue happened: On node 0 totalpages: 4177256 num_physpages != get_num_physpages()) reason = "memory size"; if (reason) { printk(KERN_ERR "PM: Image mismatch: %s\n", reason); return -EPERM; }

Memory size mismatch (2)

Boot

Memory map of Boot

Memory size mismatch (3)

Resume Boot

Memory map of Resume Boot

EFI memmap shift

Misidentification of nosave region (1)

1 pageIn usable

Not alignEFI_LOADER_DATA

setup_data and E820_RESERVED_KERN

setup_data: a linked list for carrying data with boot_params to later boot stage.Allocated in EFI stub, reserved via memblock and e820.

Yinghai Lu [PATCH] x86, boot: clean up setup_data handlinghttps://lkml.org/lkml/2015/2/28/272

SETUP_E820_EXT, SETUP_EFI SETUP_DTB, SETUP_PCI SETUP_KASLR

Those setup_data chunks are not page align when allocating. That causes hole between e820 entries, then kernel register it as 1 page nosave regions. trampoline_pgd:We map EFI runtime services in the aforementioned PGD in the virtual range of 64Gb (arbitrarily set, can be raised if needed)0xffffffef00000000 - 0xffffffff00000000

Memory mapping of EFI runtime services (2)

Virtual memory map x86_64 of runtime service trampoline_pgd

Runtime CodeRuntime Data

0xffffffffffffffff

0x0000000000000000

0x00000000bb385000

0xffffffff00000000

4 G

64 G

0x00000000bb3e5000

0xffffffef00000000

Boot DataBoot Code1:1 mapping
workaround1:1 mapping
workaround1:1 mapping
workaround1:1 mapping
workaroundBoot DataBoot Dataarch/x86/platform/efi/efi_64.c::efi_map_region()

Memory mapping of EFI runtime services (3)

In -4G area:

Runtime CodeRuntime Data0xffffffff00000000

0xffffffef00000000

Boot DataBoot Code64 G

Boot DataBoot Data2M-alignedarch/x86/platform/efi/efi_64.c::efi_map_region()

Should fix runtime services address after S4

Lee, Chun-Yi [PATCH] x86_64/efi: Mapping Boot and Runtime EFI memory regions to different starting virtual addressVA of EFI runtime services should may changed between hibernation, but that's fine when PA doesn't change.

Should checking more detail about EFI page table when hibernation recovery.

Challenges

Hibernation's Challenge

KASLR (Kernel address space layout randomization)Exclusive with hibernation

Intel Rapid StartA replacement of kernel hibernation

May also conflict with KASLR

NVDIMMDo not need hibernation anymore

TheoryMathematics

Q&A

SUSE is Hiring
Please search SUSE Careers
and
http://www.104.com.tw/

SUMMIT 2015

OPENSUSE ASIATaipei,R.O.C(Taiwan)Bring you to the free world

Click to edit the title text format

Outline text format
Test Line 2
Test Line 3

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Join us on:
www.opensuse.org

15/08/15