Kernel Debugging With Kgdb
In Kernel memory debugging techniques,
we talked about using script decode_stacktrace.sh
to translate addresses to
lines, or if only interested in one entry in the stack, then addr2line should be
enough, this works sometimes, but in most cases we may need to dig deeper to figure
out what’s going on with the kernel, we may need to know the variables involved,
or value of registers, in this situation we need to do online kernel debugging,
and kgdb/kdb is designed for this purpose.
Both kgdb and kdb are kernel debugger front ends interfacing to the kernel debug core, you can switch between them if necessary.
kdb is shell-like debugger, you can use it to dump memory contents, backtrace, do lsmod etc, kdb is not a source level debugger, here is a full list of commands supported by kernel (v4.19):
Command | Usage | Description |
---|---|---|
bc | <bpnum> | Clear Breakpoint |
be | <bpnum> | Enable Breakpoint |
bd | <bpnum> | Disable Breakpoint |
bl | [<vaddr>] | Display breakpoints |
bp | [<vaddr>] | Set/Display breakpoints |
bt | [<vaddr>] | Stack traceback |
btp | <pid> | Display stack for process <pid> |
bta | [D|R|S|T|C|Z|E|U|I|M|A] | Backtrace all processes matching state flag |
btc | Backtrace current process on each cpu | |
btt | <vaddr> | Backtrace process given its struct task address |
cpu | <cpunum> | Switch to new cpu |
defcmd | name “usage” “help” | Define a set of commands, down to endefcmd |
dmesg | [lines] | Display kernel log |
dumpcommon | Common kdb debugging | |
dumpall | First line debugging | |
dumpcpu | Same as dumpall but only tasks on cpus | |
ef | <vaddr> | Display exception frame |
env | Show environment variables | |
ftdump | [skip_#lines] [cpu] | Dump ftrace log |
go | [<vaddr>] | Continue Execution |
grephelp | Display help on | grep | |
help(?) | Display Help Message | |
kgdb | Enter kgdb mode | |
kill | <-signal> <pid> | Send a signal to a process |
lsmod | List loaded kernel modules | |
md | <vaddr> | Display Memory Contents, also mdWcN, e.g. md8c1 |
mdr | <vaddr> <bytes> | Display Raw Memory |
mdp | <paddr> <bytes> | Display Physical Memory |
mds | <vaddr> | Display Memory Symbolically |
mm | <vaddr> <contents> | Modify Memory Contents |
ps | [<flags>|A] | Display active task list |
pid | <pidnum> | Switch to another task |
per_cpu | <sym> [<bytes>] [<cpu>] | Display per_cpu variables |
rd | Display Registers | |
rm | <reg> <contents> | Modify Registers |
reboot | Reboot the machine immediately | |
set | Set environment variables | |
sr | <key> | Magic SysRq key |
ss | Single Step | |
summary | Summarize the system |
kgdb serves as kernel gdb server, you must use it along with gdb, kgdb know nothing about Linux kernel.
Build kernel with appropriate configure options
In order to make full use of kdb/kgdb, the following options are recommended:
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_FRAME_POINTER=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_KDB=y
CONFIG_KDB_DEFAULT_ENABLE=0x1
CONFIG_KDB_KEYBOARD=y
CONFIG_KDB_CONTINUE_CATASTROPHIC=0
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_CONSOLE_POLL=y
CONFIG_GDB_SCRIPTS=y
# CONFIG_STRICT_KERNEL_RWX is not set
You can check your current build with zcat /proc/config.gz
.
Build gdb client for arm64 (optional)
This part is optional, use gdb released with toolchain, only build your own if needed, download the latest version from gnu website, and make sure to specify the right target and have python supported:
mkdir build
cd build
../configure --target=aarch64-linux-gnu-gcc --with-python=/usr/bin/python
make
sudo make install
Using kdb
Before using kdb, first thing need to do is to enable it by register kgdb I/O driver:
echo ttyS0 >/sys/module/kgdboc/parameters/kgdboc
[ 100.232070] KGDB: Registered I/O driver kgdboc
At the end, you may want to disable it by echoing empty string to kgdboc:
echo "" >/sys/module/kgdboc/parameters/kgdboc
[ 102.221318] KGDB: Unregistered I/O driver kgdboc, debugger disabled
kgdboc stands for kgdb over console.
In this section, we will be using below example for oops analysis:
#include <linux/module.h>
static noinline int hello_oops_init(void)
{
printk("hello oops\n");
*(int*)0x150912 = 0x5a5a;
return 0;
}
static void hello_oops_exit(void)
{
printk("goodbye oops\n");
}
module_init(hello_oops_init);
module_exit(hello_oops_exit);
MODULE_AUTHOR("oops");
MODULE_DESCRIPTION("oops example");
MODULE_LICENSE("GPL");
When a oops or panic occurs, kernel will enter kdb automatically:
[ 82.581270] oops: loading out-of-tree module taints kernel.
[ 82.619378] hello oops
[ 82.622143] Unable to handle kernel paging request at virtual address dfffff900002a122
[ 82.637165] Mem abort info:
[ 82.647074] ESR = 0x96000004
[ 82.654150] Exception class = DABT (current EL), IL = 32 bits
[ 82.670107] SET = 0, FnV = 0
[ 82.676473] EA = 0, S1PTW = 0
[ 82.681947] Data abort info:
[ 82.685457] ISV = 0, ISS = 0x00000004
[ 82.689841] CM = 0, WnR = 0
[ 82.693166] [dfffff900002a122] address between user and kernel address ranges
[ 82.701019] Internal error: Oops: 96000004 [#1] PREEMPT SMP
Entering kdb (current=0xffffffc02a00bd00, pid 727) on processor 3 Oops: (null)
due to oops @ 0xffffff9001c50044
CPU: 3 PID: 727 Comm: insmod Tainted: G B O 4.19.108-v8+ #28
Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO)
pc : hello_oops_init+0x44/0x8c [oops]
lr : hello_oops_init+0x2c/0x8c [oops]
sp : ffffffc02da0f8c0
x29: ffffffc02da0f8c0 x28: dfffff9000000000
x27: ffffff9001c523f8 x26: ffffff900af91b40
x25: 0000000000000000 x24: ffffffc02a00bd00
x23: ffffff900af91b08 x22: ffffffc02a00bd10
x21: 1ffffff805b41f28 x20: ffffff9001c50000
x19: 0000000000150912 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: ffffff90080b49f4
x13: ffffff90082bb190 x12: ffffff820170284c
x11: 1ffffff20170284b x10: ffffff820170284b
x9 : 0000000000000000 x8 : dfffff9000000000
x7 : ffffff820170284c x6 : ffffff900b814259
x5 : ffffffc02a00bd00 x4 : 0000000000000000
more>
Using kgdb
gdb uses serial driver to communicate with kgdb, if you want to use both serial console and kgdb at the same time, you need to use a proxy, I tried both kdmx and agent-proxy, and find out agent-proxy is the right choice for me, although Doug Anderson says kdmx is more reliable in his talk at ELC19, I am using Ubuntu 18.04 Desktop, maybe the environment matters, setting up with agent-proxy as follows:
agent-proxy 4440^4441 0 /dev/ttyUSB0,115200
Then use telnet to connect to serial console:
telnet localhost 4440
To quit telnet session, press Ctrl-]
then type quit.
Before using kgdb, you need to switch kgdb mode from kgb with command kgdb:
[2]kdb> kgdb
Entering please attach debugger or use $D#44+ or $3#33
NOTE:
If kgdboc parameter was set in kernel cmdline, then gdb can be connected to kgdb
directly without the need of entering kgdb
command, I’ve noticed this prompt
when kgdboc was set in kernel cmdline parameter:
[ 8.213562] KGDB: Waiting for connection from remote gdb...
Now we are ready for using gdb to debug kernel/modules:
cd /opt/lineageos/kernel/kernel_rpi
aarch64-linux-gnu-gdb samples/hello/oops.o -ex "target remote localhost:4441"
Then list source code around pc to get line number:
(gdb) list *(hello_oops_init+0x44)
0xe4 is in hello_oops_init (samples/hello/oops.c:7).
2
3 static noinline int hello_oops_init(void)
4 {
5 printk("hello oops\n");
6
7 *(int*)0x150912 = 0x5a5a;
8
9 return 0;
10 }
To get back to kgb mode, blindly typing $3#33 or sending maintenance packet:
(gdb) maintenance packet 3
sending: "3"
received: "OK"
Enable kgdb on boot
In order to make this work on Raspberry Pi 3B, add below to config.txt
in boot
partition:
dtoverlay=pi3-disable-bt
And add kernel parameters to cmdline.txt
:
kgdboc=serial0,115200 kgdbwait nokaslr
There is a slab-out-of-bounds
bug in dwc_otg
driver, and with KASAN enabled,
kernel will enter kgdb mode during system rebooting:
[ 6.623132] ==================================================================
[ 6.649052] console [ttyAMA0] enabled
[ 6.655946] BUG: KASAN: slab-out-of-bounds in dwc_otg_hcd_is_bandwidth_allocated+0x68/0x70
[ 6.656005] Read of size 8 at addr ffffffc02f6b3a68 by task kworker/1:1/34
[ 6.656044]
[ 6.677007] CPU: 1 PID: 34 Comm: kworker/1:1 Not tainted 4.19.108-v8+ #57
[ 6.677042] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[ 6.677124] Workqueue: events_power_efficient hub_init_func3
[ 6.703594] Call trace:
[ 6.703655] dump_backtrace+0x0/0x3f8
[ 6.703706] show_stack+0x28/0x38
[ 6.703776] dump_stack+0x100/0x168
[ 6.703864] print_address_description+0x58/0x2b0
[ 6.722722] kasan_report+0x174/0x2f8
[ 6.722786] __asan_report_load8_noabort+0x30/0x40
[ 6.722869] dwc_otg_hcd_is_bandwidth_allocated+0x68/0x70
[ 6.722943] dwc_otg_urb_enqueue+0x6c0/0xd78
[ 6.723045] usb_hcd_submit_urb+0x1d8/0x19c8
[ 6.745473] mmc-bcm2835 3f300000.mmc: mmc_debug:0 mmc_debug2:0
[ 6.746319] usb_submit_urb+0x560/0x11e8
[ 6.746386] hub_activate+0xa08/0x1348
[ 6.746452] hub_init_func3+0x28/0x38
[ 6.746541] process_one_work+0x6c0/0x1360
[ 6.755999] mmc-bcm2835 3f300000.mmc: DMA channel allocated
[ 6.766139] worker_thread+0x400/0xe70
[ 6.766208] kthread+0x278/0x350
[ 6.766271] ret_from_fork+0x10/0x18
[ 6.766310]
[ 7.823149] Allocated by task 34:
[ 7.826573] kasan_kmalloc.part.0+0x44/0x108
[ 7.830935] kasan_kmalloc+0xb0/0xc8
[ 7.834611] __kmalloc+0x170/0x358
[ 7.838118] usb_get_configuration+0x1a0c/0x48d8
[ 7.842841] usb_new_device+0x89c/0xf68
[ 7.846779] hub_event+0x14fc/0x2f48
[ 7.850452] process_one_work+0x6c0/0x1360
[ 7.854644] worker_thread+0x400/0xe70
[ 7.858494] kthread+0x278/0x350
[ 7.861821] ret_from_fork+0x10/0x18
[ 7.865445]
[ 7.866999] Freed by task 0:
[ 7.869932] (stack is not available)
[ 7.873556]
[ 7.875133] The buggy address belongs to the object at ffffffc02f6b3a00
[ 7.875133] which belongs to the cache kmalloc-128 of size 128
[ 7.887781] The buggy address is located 104 bytes inside of
[ 7.887781] 128-byte region [ffffffc02f6b3a00, ffffffc02f6b3a80)
[ 7.899604] The buggy address belongs to the page:
[ 7.904492] page:ffffffbf00bdacc0 count:1 mapcount:0 mapping:ffffffc031403c00 index:0x0
[ 7.912579] flags: 0x200(slab)
[ 7.915766] raw: 0000000000000200 dead000000000100 dead000000000200 ffffffc031403c00
[ 7.923634] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[ 7.931440] page dumped because: kasan: bad access detected
[ 7.937061]
[ 7.938607] Memory state around the buggy address:
[ 7.943494] ffffffc02f6b3900: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[ 7.950810] ffffffc02f6b3980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 7.958126] >ffffffc02f6b3a00: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[ 7.965412] ^
[ 7.972119] ffffffc02f6b3a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 7.979434] ffffffc02f6b3b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 7.986716] ==================================================================
[ 7.993995] Disabling lock debugging due to kernel taint
...
[ 8.207453] KGDB: Registered I/O driver kgdboc
[ 8.213562] KGDB: Waiting for connection from remote gdb...
Entering kdb (current=0xffffffc031588000, pid 1) on processor 2 due to Keyboard Entry
[2]kdb>
Example: slab-out-of-bounds
The KASAN report says, the OOB was caused by dwc_otg_hcd_is_bandwidth_allocated+ 0x68/0xa8, so let’s checkout what is it:
# aarch64-linux-gnu-gdb vmlinux -ex "target remote localhost:4441"
(gdb) list *(dwc_otg_hcd_is_bandwidth_allocated+0x68)
0xffffff900914eba8 is in dwc_otg_hcd_is_bandwidth_allocated (drivers/usb/host/dwc_otg/dwc_otg_hcd.c:4083).
4078 {
4079 int allocated = 0;
4080 dwc_otg_qh_t *qh = (dwc_otg_qh_t *) ep_handle;
4081
4082 if (qh) {
4083 if (!DWC_LIST_EMPTY(&qh->qh_list_entry)) {
4084 allocated = 1;
4085 }
4086 }
4087 return allocated;
The relevant definition is in drivers/usb/host/dwc_common_port/dwc_list.h:
#define DWC_LIST_FIRST(link) ((link)->next)
#define DWC_LIST_END(link) (link)
#define DWC_LIST_EMPTY(link) \
(DWC_LIST_FIRST(link) == DWC_LIST_END(link))
Set breakpoint to see what we have in qh_list_entry:
(gdb) l dwc_otg_hcd.c:4082
4077 __attribute__((optimize("O0"))) int dwc_otg_hcd_is_bandwidth_allocated(dwc_otg_hcd_t * hcd, void *ep_handle)
4078 {
4079 int allocated = 0;
4080 dwc_otg_qh_t *qh = (dwc_otg_qh_t *) ep_handle;
4081
4082 if (qh) {
4083 if (!DWC_LIST_EMPTY(&qh->qh_list_entry)) {
4084 allocated = 1;
4085 }
4086 }
(gdb) b dwc_otg_hcd.c:4082
Breakpoint 1 at 0xffffff900914eb70: file drivers/usb/host/dwc_otg/dwc_otg_hcd.c, line 4082.
(gdb) c
Continuing.
[Switching to Thread 36]
Thread 40 hit Breakpoint 1, dwc_otg_hcd_is_bandwidth_allocated (hcd=0xffffffc02fa6c380, ep_handle=0xffffffc02f682828)
at drivers/usb/host/dwc_otg/dwc_otg_hcd.c:4082
4082 if (qh) {
(gdb) p &qh->qh_list_entry
$1 = (dwc_list_link_t *) 0xffffffc02f682868
(gdb) p qh->qh_list_entry
$2 = {next = 0x0, prev = 0x0}
[ 6.577056] Read of size 8 at addr ffffffc02f682868 by task kworker/2:2/84
[ 7.686121] The buggy address belongs to the object at ffffffc02f682800
[ 7.686121] which belongs to the cache kmalloc-128 of size 128
[ 7.698765] The buggy address is located 104 bytes inside of
[ 7.698765] 128-byte region [ffffffc02f682800, ffffffc02f682880)
The guilty address is pointing to qh_list_entry.
From above result, we know qh_list_entry was not initialized, so KASAN will be triggered while reading its member.
Troubleshooting
Timout when connecting to kgdb
Q: When using aarch64-linux-gnu-gdb vmlinux -ex "target remote localhost:4441
Reading symbols from vmlinux...done.
Remote debugging using localhost:4441
Ignoring packet error, continuing...
warning: unrecognized item "timeout" in "qSupported" response
Ignoring packet error, continuing...
Remote replied unexpectedly to 'vMustReplyEmpty': timeout
A: Make sure kernel is in kgdb mode.
Auto-loading has been declined by…
Q: Auto-loading has been declined by…
warning: File "/opt/lineageos/kernel/kernel_rpi/scripts/gdb/vmlinux-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path /opt/lineageos/kernel/kernel_rpi/scripts/gdb/vmlinux-gdb.py
line to your configuration file "/home/fdbai/.gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/home/fdbai/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
Remote debugging using localhost:4441
0xffffff832b40901c in ?? ()
(gdb) list *(do_oops+0x18)
No symbol "do_oops" in current context.
A: Add follow to $HOME/.gdbinit
add-auto-load-safe-path /opt/lineageos/kernel/kernel_rpi
gdb report no symbol when trying to list source code
Q: Source code not shown with list command.
(gdb) list *(do_oops+0x18)
No symbol "do_oops" in current context.
A: Make sure you have loaded the right file, do not load vmlinux if you are debugging kernel module:
# load module object
aarch64-linux-gnu-gdb samples/hello/oops.o -ex "target remote localhost:4441"
How to exit gdb layout window
Q: After entering tui, how can I get out of TUI window?
A: ctrl+x a
You can find more key bindings in 25.2 TUI Key Bindings.
The variable was optimized out
Q: When I want to print some variables, gdb print
(gdb) p qh
$1 = <optimized out>
A: To avoid code optimization, you can either use GCC pragma:
+#pragma GCC optimize ("O0")
or gcc attribute:
-int dwc_otg_hcd_is_bandwidth_allocated(dwc_otg_hcd_t * hcd, void *ep_handle)
+__attribute__((optimize("O0"))) int dwc_otg_hcd_is_bandwidth_allocated(dwc_otg_hcd_t * hcd, void *ep_handle)
There are other ways to get optimized out value of a variable, undo has a great writeup about how to do this with the help of debugging data in ELF files.
Another interesting project is Mozilla rr, currently aarch64 architecture was not supported yet, fortunately, Keno has made some progress on this, see this issue for updated progress.