Phicomm N1 Box High CPU Usage Due to IRQ-42

6 minute read

While playing Armbian with Phicomm N1 box, I found kernel thread irq/42-0.200908 have very high cpu usage thus cause the load average to around 2, this is a big value for a idle system:

$ top
top - 00:24:08 up  1:11,  1 user,  load average: 2.00, 2.00, 2.00
Tasks: 149 total,   2 running, 147 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  5.6 sy,  0.0 ni, 92.7 id,  0.0 wa,  1.6 hi,  0.1 si,  0.0 st
MiB Mem :   1739.3 total,   1465.9 free,    122.7 used,    150.8 buff/cache
MiB Swap:    869.6 total,    869.6 free,      0.0 used.   1477.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    498 root     -51   0       0      0      0 D  11.6   0.0   6:47.13 irq/42-0.200908
    527 root      20   0       0      0      0 D   6.3   0.0   0:37.69 kworker/u8:3+events_power_efficient
   2533 root      20   0       0      0      0 I   1.3   0.0   0:04.74 kworker/u8:1-events_power_efficient

From another Phicomm box, T1, the load average is much lower:

top - 09:14:40 up 5 days, 20:57,  4 users,  load average: 0.17, 0.13, 0.14

Observing interrupts with watch -n1 -d cat /proc/interrupts I noticed interrupt #42 increased very quickly, it’s about 1500 interrupts per second, I suppose this is not a normal value:

 $ while true;do cat /proc/interrupts |grep meson-gpio-irqchip;sleep 1;done
 42:    1969856          0          0          0  meson-gpio-irqchip  29 Level     0.2009087f:00
 42:    1971361          0          0          0  meson-gpio-irqchip  29 Level     0.2009087f:00
 42:    1972908          0          0          0  meson-gpio-irqchip  29 Level     0.2009087f:00
 42:    1974326          0          0          0  meson-gpio-irqchip  29 Level     0.2009087f:00

Look into file arch/arm64/boot/dts/amlogic/meson-gxl-s905d-p230.dts, we know this interrupt is used for Ethernet:

external_phy: ethernet-phy@0 {
	/* Realtek RTL8211F (0x001cc916) */
	reg = <0>;
	max-speed = <1000>;

	/* External PHY reset is shared with internal PHY Led signal */
	reset-assert-us = <10000>;
	reset-deassert-us = <30000>;
	reset-gpios = <&gpio GPIOZ_14 GPIO_ACTIVE_LOW>;

	interrupt-parent = <&gpio_intc>;
	interrupts = <29 IRQ_TYPE_LEVEL_LOW>;
	eee-broken-1000t;
};

From the output of top command, there are two threads in uninterruptible state (D), let’s see what are they doing:

root@Phicomm-N1:~# echo w > /proc/sysrq-trigger; dmesg -c
[  115.145217] sysrq: Show Blocked State
[  115.149200]   task                        PC stack   pid father
[  115.149262] kworker/u8:1    D    0   114      2 0x00000028
[  115.149289] Workqueue: events_power_efficient phy_state_machine
[  115.149327] Call trace:
[  115.149340]  __switch_to+0xfc/0x150
[  115.149352]  __schedule+0x35c/0x7f0
[  115.149360]  schedule+0x80/0x118
[  115.149370]  schedule_hrtimeout_range_clock+0x90/0x120
[  115.149378]  schedule_hrtimeout_range+0x1c/0x28
[  115.149386]  usleep_range+0x8c/0xb8
[  115.149397]  stmmac_mdio_read+0x194/0x208
[  115.149406]  mdio_mux_read+0x78/0xa8
[  115.149415]  __mdiobus_read+0x50/0x268
[  115.149423]  mdiobus_read+0x58/0xe8
[  115.149459]  genphy_update_link+0x2c/0xc0
[  115.149468]  genphy_read_status+0x2c/0xf8
[  115.149478]  phy_check_link_status+0xe8/0xf8
[  115.149487]  phy_state_machine+0xa8/0x188
[  115.149497]  process_one_work+0x2b8/0x738
[  115.149506]  worker_thread+0x250/0x4a0
[  115.149515]  kthread+0x170/0x178
[  115.149524]  ret_from_fork+0x10/0x18
[  115.149632] irq/42-0.200908 D    0   546      2 0x00000028
[  115.149644] Call trace:
[  115.149654]  __switch_to+0xfc/0x150
[  115.149663]  __schedule+0x35c/0x7f0
[  115.149718]  __schedule+0x34c/0x7f0
[  115.149728]  schedule+0x80/0x118
[  115.149736]  schedule_hrtimeout_range_clock+0x90/0x120
[  115.149744]  schedule_hrtimeout_range+0x1c/0x28
[  115.149752]  usleep_range+0x8c/0xb8
[  115.149762]  stmmac_mdio_read+0x194/0x208
[  115.149771]  mdio_mux_read+0x78/0xa8
[  115.149779]  __mdiobus_read+0x50/0x268
[  115.149789]  phy_read_paged+0x44/0x68
[  115.149799]  rtl8211f_ack_interrupt+0x24/0x68
[  115.149840]  phy_interrupt+0xac/0xc0
[  115.149851]  irq_thread_fn+0x34/0xa8
[  115.149860]  irq_thread+0x1b0/0x280
[  115.149868]  kthread+0x170/0x178
[  115.149877]  ret_from_fork+0x10/0x18

Since the high CPU load was originated from interrupt, so let’s dive into the interrupt handler to see what it did when an interrupt was triggered, that is the gpio(29) is low, phy_trigger_machine and phy_clear_interrupt will be called.

phy_clear_interrupt will call drv->ack_interrupt directly, which points to rtl8211f_ack_interrupt the relevant codes are in drivers/net/phy/realtek.c:

static int rtl8211f_ack_interrupt(struct phy_device *phydev)
{
	int err;

	err = phy_read_paged(phydev, 0xa43, RTL8211F_INSR);

	return (err < 0) ? err : 0;
}

It reads the value of interrupt status register RTL8211F_INSR, if an interrupt occurs, the corresponding bit will be set to 1, and the bit definition can be found in section 8.4.19 INSR (Interrupt Status Register), the whole datasheet can be found here:

INSR (Interrupt Status Register)

Let’s see what kind of interrupts are they, with trace point mdio:mdio_access we can dump the value returned by bus->read, which is issued by phy_read_paged, it returns zero every time the interrupt happens (see line of regnum=29), which means there are no interrupts at all:

# argdist-bpfcc -C 't:mdio:mdio_access():char,unsigned,u16:args->read,args->regnum,args->val'
[07:35:17]
t:mdio:mdio_access():char,unsigned,u16:args->read,args->regnum,args->val
	COUNT      EVENT
	1188       args->read = b'\x01', args->regnum = 0, args->val = 4160
	1189       args->read = b'\x01', args->regnum = 1, args->val = 31149
	1518       args->read = b'\x00', args->regnum = 31, args->val = 2626
	1518       args->read = b'\x01', args->regnum = 29, args->val = 0
	1519       args->read = b'\x01', args->regnum = 31, args->val = 2626
	1519       args->read = b'\x00', args->regnum = 31, args->val = 2627

Now let’s figure out why fake interrupt leads to high CPU load, decode the stack trace reveals the guilty:

cat irq42.txt|./scripts/decode_stacktrace.sh vmlinux . .

[  115.149762] stmmac_mdio_read (/opt/kernel/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c:225 (discriminator 12))
[  115.149771] mdio_mux_read (/opt/kernel/drivers/net/phy/mdio-mux.c:51)
[  115.149779] __mdiobus_read (/opt/kernel/drivers/net/phy/mdio_bus.c:794 (discriminator 3))
[  115.149789] phy_read_paged (/opt/kernel/./include/linux/phy.h:742 /opt/kernel/drivers/net/phy/phy-core.c:818)
[  115.149799] rtl8211f_ack_interrupt (/opt/kernel/drivers/net/phy/realtek.c:92)
[  115.149840] phy_interrupt (/opt/kernel/drivers/net/phy/phy.c:732 /opt/kernel/drivers/net/phy/phy.c:717)

Both threads called stmmac_mdio_read which involves readl_poll_timeout, it’s a busy loop with a sleep of 100us, considering the amount of fake interrupts, it contributes to high CPU load.

It could be specifying wrong interrupt trigger type or a wrong GPIO, I tried to change the trigger type to IRQ_TYPE_LEVEL_HIGH, to no avail.

The quickest way to get this done is to decompile the dtb from android image to see what’s the differences, I downloaded a modified ROM from YYFROM and unpacked it with Linux AMLogic Toolkit, but I got incorrect magic number when decompiling meson1.dtb:

dtc -I dtb -O dts -o meson1.dts meson1.dtb
FATAL ERROR: Blob has incorrect magic number

Open with hex editor there is no D00DFEED in it, and binwalk returns nothing.

As there is no schematic for this deprecated box, it’s hard to confirm which GPIO it was used, searching around I found flippy has already fixed this issue and also made other improvements regarding ethernet performance.

This patch fixes the IRQ issue:

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxl-s905d-phicomm-n1.dts b/arch/arm64/boot/dts/amlogic/meson-gxl-s905d-phicomm-n1
index b5667f1fb..05cbd3969 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxl-s905d-phicomm-n1.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxl-s905d-phicomm-n1.dts
@@ -29,3 +29,7 @@
 &cvbs_vdac_port {
        status = "disabled";
 };
+
+&external_phy {
+       interrupts = <25 IRQ_TYPE_LEVEL_LOW>;
+};

There are some useful command I have been used on working this issue:

trace-cmd record -p function_graph -l rtl8211f_* -l phy_* -l __mdiobus_* -l mdio_* -l stmmac_mdio_* -l mutex_* -l mdio* -l usleep_* -l genphy*

trace-cmd record -p function_graph -l rtl8211f_* -l phy_state_machine -l phy_interrupt -l mutex_* -l usleep*

tplist-bpfcc -v mdio:mdio_access

funclatency-bpfcc rtl8211f_ack_interrupt -u -d 30

klockstat-bpfcc -d 30 -S hld_total,acq_total

argdist-bpfcc -C 'r::rtl8211f_ack_interrupt(struct phy_device *phydev):$ret'

Troubleshooting

Failed to mount boot partition

[   29.974974] systemd[1]: Mounting /boot...
[   30.034358] FAT-fs (sda1): IO charset ascii not found
[   30.037106] systemd[1]: boot.mount: Mount process exited, code=exited, status=32/n/a
[   30.037126] systemd[1]: boot.mount: Failed with result 'exit-code'.
[   30.038740] systemd[1]: Failed to mount /boot.

CONFIG_NLS_ASCII is required or insmod nls_ascii.ko to avoid mount failure.

bpf: Failed to load program: Invalid argument

# klockstat-bpfcc
bpf: Failed to load program: Invalid argument

Traceback (most recent call last):
  File "/usr/sbin/klockstat-bpfcc", line 390, in <module>
    b.attach_kprobe(event="mutex_unlock", fn_name="mutex_unlock_enter")
  File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 654, in attach_kprobe
    fn = self.load_func(fn_name, BPF.KPROBE)
  File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 394, in load_func
    raise Exception("Failed to load BPF program %s: %s" %
Exception: Failed to load BPF program b'mutex_unlock_enter': Invalid argument

In order to use kprobe, these options should be enabled in kernel configuration:

+CONFIG_KPROBES=y
+CONFIG_KPROBE_EVENTS=y
+CONFIG_BPF_EVENTS=y
+CONFIG_FTRACE=y

Cannot boot big kernel larger than 32 MB

I am using Armbian build by 150balbes, the maximum allow kernel is 32 MB, the kernel was booted by U-Boot, kernel and ramdisk load address are set in s905_autoscript.cmd:

setenv kernel_addr_r 0x11000000
setenv ramdisk_addr_r 0x13000000

So if we load big kernel, the kernel image contents higher than 0x13000000 will be overwritten, change ramdisk_addr_r to a bigger value (say: 0x15000000) fix this issue.

Regenerate autoscript with command:

mkimage -A arm -O linux -T script -C none  \
-n "boot script" -d ./s905_autoscript.cmd s905_autoscript

If you want boot kernel from internal emmc, then change the value of initrd_addr in emmc_autoscript.cmd:

setenv initrd_addr 0x15000000

Decompiling dtb failed with ‘FATAL ERROR: Blob has incorrect magic number’

I have no idea about this for now.