Kernel memory debugging techniques

7 minute read

Linux kernel has many memory related debug tools to help kernel developers find out memory related issues, these tools are included in upstream kernel:

And there are work in progress tools:

kmemleak

kmemleak is a kernel debug option used for detecting kernel memory leaks which can be found under Kernel hacking ---> Memory Debugging section, enable these options in order to use this feature:

+CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_KMEMLEAK=y
+CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=2000
+CONFIG_DEBUG_KMEMLEAK_TEST=m

After enabling above configs, build and test new kernel, the following message will be shown in the kernel log:

[    1.099444] kmemleak: Kernel memory leak detector initialized
[    1.099997] kmemleak: Automatic memory scanning thread started

In my system, debugfs was mounted at /sys/kernel/debug by default, if not, mount it with command:

mount -t debugfs nodev /sys/kernel/debug/

kmemleak was enabled by default, this can be changed by enabling below option: CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF

Append kmemleak=on to cmdline(in one line) when you need to enable it.

The default early log buffer size is not big enough for my 64-bit kernel, the kernel disabled kmemleak and report this log if the log buffer exceeds the limit:

[    0.000000] kmemleak: Kernel memory leak detector disabled
[    0.000291] kmemleak: Early log buffer exceeded (1333), please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE

A kernel thread scans memory for unreferenced objects every 600 seconds, and prints below message if a suspected leak was found:

[  646.473426] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

Do cat /sys/kernel/debug/kmemleak to see the leak info with stacktrace, and the report will be similar to this:

kmemleak

The backtrace information can be decoded with script scripts/decode_stacktrace.sh. Save the leak report as memleak.txt, the feed it to the script:

cat memleak.txt |./scripts/decode_stacktrace.sh vmlinux . mm/

The script translating offset to line number by using addr2line, make sure enable CONFIG_DEBUG_INFO option to build kernel with debug info.

Use echo scan > /sys/kernel/debug/kmemleak to trigger memory scan and echo clear > /sys/kernel/debug/kmemleak to clear current leak info.

KASAN

Kernel Address SANitizer(aka KASAN) is the kernel space address sanitizer, which can be enabled by config option CONFIG_KASAN, there are two different compiler instrumentations: CONFIG_KASAN_INLINE and CONFIG_KASAN_OUTLINE, personally I recommend the former one, because it is 1.1 - 2 times faster, although, it produces fairly large kernel image(39MB with DEBUG_INFO enabled), which may overlaps ramdisk image, change ramdisk load address to followkernel to avoid boot issue with newly built kernel, here is my KASAN configurations:

+CONFIG_DEBUG_INFO=y
+CONFIG_KASAN=y
+CONFIG_KASAN_INLINE=y
+CONFIG_TEST_KASAN=m

KASAN can be disabled on file or directory basis, take a look at mm/Makefile and mm/kasan/Makefile for example:

# grep KASAN_SANITIZE mm/kasan/Makefile mm/Makefile
KASAN_SANITIZE := n
KASAN_SANITIZE_slab.o := n

With KASAN feature enabled, kernel will print following message at the very first stage, indicates KASAN was initialized:

[    0.000000] kasan: KernelAddressSanitizer initialized

Linux kernel comes with an example in lib/test_kasan.c showing what kind of memory bugs can be detected by KASAN, here is part of the report when doing insmod test_kasan.ko:

[ 2237.155888] kasan test: kmalloc_oob_right out-of-bounds to right
[ 2237.162114] ==================================================================
[ 2237.169513] BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0x90/0xc8 [test_kasan]
[ 2237.177567] Write of size 1 at addr ffffffd92cd1077b by task insmod/1472
[ 2237.184379]
[ 2237.185917] CPU: 3 PID: 1472 Comm: insmod Tainted: G    B           4.14.129-v8+ #27
[ 2237.193789] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
[ 2237.200163] Call trace:
[ 2237.202678] [<ffffffa256e943e0>] dump_backtrace+0x0/0x428
[ 2237.208183] [<ffffffa256e9482c>] show_stack+0x24/0x30
[ 2237.213333] [<ffffffa2584baa9c>] dump_stack+0xac/0xe8
[ 2237.218487] [<ffffffa257284ac0>] print_address_description+0x68/0x278
[ 2237.225043] [<ffffffa257284fd0>] kasan_report+0x228/0x308
[ 2237.230548] [<ffffffa257285220>] __asan_report_store1_noabort+0x30/0x40
[ 2237.237322] [<ffffff9001708420>] kmalloc_oob_right+0x90/0xc8 [test_kasan]
[ 2237.244269] [<ffffff90017096b8>] kmalloc_tests_init+0x18/0x960 [test_kasan]
[ 2237.251363] [<ffffffa256e84d2c>] do_one_initcall+0xac/0x268
[ 2237.257043] [<ffffffa25704675c>] do_init_module+0x198/0x4e4
[ 2237.262719] [<ffffffa25704304c>] load_module+0x534c/0x6ff0
[ 2237.268307] [<ffffffa2570450d0>] SyS_finit_module+0x128/0x140
[ 2237.274155] Exception stack(0xffffffd912dd7ec0 to 0xffffffd912dd8000)
[ 2237.280712] 7ec0: 0000000000000003 00000000b780ac58 0000000000000000 0000000000000000
[ 2237.288680] 7ee0: 0000000000000003 0000000000000000 00000000ffeffc94 000000000000017b
[ 2237.296647] 7f00: 0000000000000000 0000000000000000 0000000000000000 00000000ffeffc8c
[ 2237.304613] 7f20: 00000000ffeffbd8 00000000ffeffbc8 00000000b77cc779 0000000000000000
[ 2237.312580] 7f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.320546] 7f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.328512] 7f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.336479] 7fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.344448] 7fc0: 00000000f1275cdc 0000000060010010 0000000000000003 000000000000017b
[ 2237.352415] 7fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.360383] [<ffffffa256e84a00>] el0_svc_naked+0x34/0x38
[ 2237.365785]
[ 2237.367313] Allocated by task 1472:
[ 2237.370882]  kasan_kmalloc+0xd0/0x180
[ 2237.374620]  kmem_cache_alloc_trace+0x13c/0x2d0
[ 2237.379264]  kmalloc_oob_right+0x50/0xc8 [test_kasan]
[ 2237.384435]  kmalloc_tests_init+0x18/0x960 [test_kasan]
[ 2237.389758]  do_one_initcall+0xac/0x268
[ 2237.393672]  do_init_module+0x198/0x4e4
[ 2237.397583]  load_module+0x534c/0x6ff0
[ 2237.401406]  SyS_finit_module+0x128/0x140
[ 2237.405493]  el0_svc_naked+0x34/0x38
[ 2237.409130]
[ 2237.410654] Freed by task 161:
[ 2237.413775]  kasan_slab_free+0x88/0x198
[ 2237.417687]  kfree+0x84/0x2c8
[ 2237.420723]  binder_delete_free_buffer+0x200/0x598
[ 2237.425606]  binder_free_buf_locked+0x3b4/0x5b0
[ 2237.430224]  binder_alloc_free_buf+0x30/0x48
[ 2237.434578]  binder_thread_write+0xd54/0x2f98
[ 2237.439019]  binder_ioctl+0xc98/0x17b4
[ 2237.442846]  compat_SyS_ioctl+0x178/0x15d8
[ 2237.447022]  el0_svc_naked+0x34/0x38
[ 2237.450659]
[ 2237.452187] The buggy address belongs to the object at ffffffd92cd10700
[ 2237.452187]  which belongs to the cache kmalloc-128 of size 128
[ 2237.464918] The buggy address is located 123 bytes inside of
[ 2237.464918]  128-byte region [ffffffd92cd10700, ffffffd92cd10780)
[ 2237.476847] The buggy address belongs to the page:
[ 2237.481729] page:ffffffbf64b34400 count:1 mapcount:0 mapping:          (null) index:0x0
[ 2237.489882] flags: 0x100(slab)
[ 2237.493006] raw: 0000000000000100 0000000000000000 0000000000000000 0000000180100010
[ 2237.500886] raw: 0000000000000000 0000000400000001 ffffffd931403c00 0000000000000000
[ 2237.508755] page dumped because: kasan: bad access detected
[ 2237.514420]
[ 2237.515941] Memory state around the buggy address:
[ 2237.520821]  ffffffd92cd10600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 2237.528168]  ffffffd92cd10680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2237.535515] >ffffffd92cd10700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03
[ 2237.542858]                                                                 ^
[ 2237.550119]  ffffffd92cd10780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2237.557466]  ffffffd92cd10800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 2237.564806] ==================================================================

With DEBUG_INFO enabled, we can translate above report with decode_stacktrace.sh which show the exact file and line number that cuased the invalid access:

$ cat report.txt | ./scripts/decode_stacktrace.sh vmlinux . lib/
[ 2237.155888] kasan test: kmalloc_oob_right out-of-bounds to right
[ 2237.162114] ==================================================================
[ 2237.169513] BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right (mod_plt_sec/lib/test_kasan.c:44) test_kasan
[ 2237.177567] Write of size 1 at addr ffffffd92cd1077b by task insmod/1472
[ 2237.184379]
[ 2237.185917] CPU: 3 PID: 1472 Comm: insmod Tainted: G    B           4.14.129-v8+ #27
[ 2237.193789] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
[ 2237.200163] Call trace:
[ 2237.202678] dump_backtrace (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/arch/arm64/kernel/traps.c:146)
[ 2237.208183] show_stack (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/arch/arm64/kernel/traps.c:216)
[ 2237.213333] dump_stack (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/lib/dump_stack.c:55)
[ 2237.218487] print_address_description (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/kasan/report.c:253)
[ 2237.225043] kasan_report (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/kasan/report.c:352 /home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/kasan/report.c:409)
[ 2237.230548] __asan_report_store1_noabort (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/kasan/report.c:432)
[ 2237.237322] kmalloc_oob_right (mod_plt_sec/lib/test_kasan.c:44) test_kasan
[ 2237.244269] kmalloc_tests_init (mod_plt_sec/lib/test_kasan.c:497) test_kasan
[ 2237.251363] do_one_initcall (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/init/main.c:825)
[ 2237.257043] do_init_module (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/kernel/module.c:3463)
[ 2237.262719] load_module (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/kernel/module.c:3790)
[ 2237.268307] SyS_finit_module (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/kernel/module.c:3889 /home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/kernel/module.c:3865)
[ 2237.274155] Exception stack(0xffffffd912dd7ec0 to 0xffffffd912dd8000)
[ 2237.280712] 7ec0: 0000000000000003 00000000b780ac58 0000000000000000 0000000000000000
[ 2237.288680] 7ee0: 0000000000000003 0000000000000000 00000000ffeffc94 000000000000017b
[ 2237.296647] 7f00: 0000000000000000 0000000000000000 0000000000000000 00000000ffeffc8c
[ 2237.304613] 7f20: 00000000ffeffbd8 00000000ffeffbc8 00000000b77cc779 0000000000000000
[ 2237.312580] 7f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.320546] 7f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.328512] 7f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.336479] 7fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.344448] 7fc0: 00000000f1275cdc 0000000060010010 0000000000000003 000000000000017b
[ 2237.352415] 7fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2237.360383] el0_svc_naked (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/arch/arm64/kernel/entry.S:930)
[ 2237.365785]
[ 2237.367313] Allocated by task 1472:
[ 2237.370882] kasan_kmalloc (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/kasan/kasan.c:427 /home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/kasan/kasan.c:448 /home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/kasan/kasan.c:459 /home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/kasan/kasan.c:551)
[ 2237.374620] kmem_cache_alloc_trace (/home/fdbai/lineageos/kernel/android_kernel_brcm_rpi3/mm/slub.c:2752)
[ 2237.379264] kmalloc_oob_right (mod_plt_sec/lib/test_kasan.c:42) test_kasan
[ 2237.384435] kmalloc_tests_init (mod_plt_sec/lib/test_kasan.c:497) test_kasan

UBSan

UBSan is a run-time undefined behavior sanity checker, in the previous post titled Android native memory debugging, we described how to use UBSan in android native apps, kernel also support UBSan starting from version 4.4, you can enable it with CONFIG_UBSAN, CONFIG_UBSAN_SANITIZE_ALL was enabled by default to check entire kernel, this will generate a extremely large kernel image, it is 107MB with KASAN and KMEMLEAK both enabled.

Sanity check for individual file or directory is also supported by adding: UBSAN_SANITIZE := y to Makefile for directory or UBSAN_SANITIZE_main.o := y for single file.

And if CONFIG_UBSAN_SANITIZE_ALL is enabled, similar method can be used to exclude file or directory from checking.

Build formerly used ubsan example as module:

 24 static int __init ubsan_test_init(void)
 25 {
 26         int k = 0x7fffffff;
 27         k += 1;
 28         return k;
 29 }
 30
 31 module_init(ubsan_test_init);

The undefined behavior(signed integer overflow) will be detected and print below report showing where the problem is when installing this module: ubsan

Kernel COVerage (KCOV)

KCOV is kernel code coverage tool and is usually used in coverage-guided fuzzing test, often used in conjuntion with KASAN, UBSAN etc to do randomized testing to find kernel memory bugs, to enable it, add CONFIG_KCOV=y to your kernel defconfig, and by default CONFIG_KCOV_INSTRUMENT_ALL=y is also enabled to do entire kernel coverage test. Coverage data can be accessed via debugfs file kcov if debugfs was mounted.

To exclude file or directory from instrumentation, add KCOV_INSTRUMENT directive like this:

kernel/locking/Makefile:KCOV_INSTRUMENT		:= n
mm/Makefile:KCOV_INSTRUMENT_vmstat.o := n

CONFIG_KCOV requires gcc built on revision 231296 (gcc 6.0+) or later, download the latest version (as of this writing) for arm64 from linaro’s website.

There is an example in syzkaller showing how to use kcov to collect coverage information in real world.

References