Phicomm N1 Box High CPU Usage Due to IRQ-42
While playing Armbian with Phicomm N1 box, I found kernel thread irq/42-0.200908
have very high cpu usage thus cause the load average to around 2, this is a big
value for a idle system:
$ top
top - 00:24:08 up 1:11, 1 user, load average: 2.00, 2.00, 2.00
Tasks: 149 total, 2 running, 147 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 5.6 sy, 0.0 ni, 92.7 id, 0.0 wa, 1.6 hi, 0.1 si, 0.0 st
MiB Mem : 1739.3 total, 1465.9 free, 122.7 used, 150.8 buff/cache
MiB Swap: 869.6 total, 869.6 free, 0.0 used. 1477.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
498 root -51 0 0 0 0 D 11.6 0.0 6:47.13 irq/42-0.200908
527 root 20 0 0 0 0 D 6.3 0.0 0:37.69 kworker/u8:3+events_power_efficient
2533 root 20 0 0 0 0 I 1.3 0.0 0:04.74 kworker/u8:1-events_power_efficient
From another Phicomm box, T1, the load average is much lower:
top - 09:14:40 up 5 days, 20:57, 4 users, load average: 0.17, 0.13, 0.14
Observing interrupts with watch -n1 -d cat /proc/interrupts
I noticed interrupt
#42 increased very quickly, it’s about 1500 interrupts per second, I suppose this
is not a normal value:
$ while true;do cat /proc/interrupts |grep meson-gpio-irqchip;sleep 1;done
42: 1969856 0 0 0 meson-gpio-irqchip 29 Level 0.2009087f:00
42: 1971361 0 0 0 meson-gpio-irqchip 29 Level 0.2009087f:00
42: 1972908 0 0 0 meson-gpio-irqchip 29 Level 0.2009087f:00
42: 1974326 0 0 0 meson-gpio-irqchip 29 Level 0.2009087f:00
Look into file arch/arm64/boot/dts/amlogic/meson-gxl-s905d-p230.dts
, we know this
interrupt is used for Ethernet:
external_phy: ethernet-phy@0 {
/* Realtek RTL8211F (0x001cc916) */
reg = <0>;
max-speed = <1000>;
/* External PHY reset is shared with internal PHY Led signal */
reset-assert-us = <10000>;
reset-deassert-us = <30000>;
reset-gpios = <&gpio GPIOZ_14 GPIO_ACTIVE_LOW>;
interrupt-parent = <&gpio_intc>;
interrupts = <29 IRQ_TYPE_LEVEL_LOW>;
eee-broken-1000t;
};
From the output of top command, there are two threads in uninterruptible state (D), let’s see what are they doing:
root@Phicomm-N1:~# echo w > /proc/sysrq-trigger; dmesg -c
[ 115.145217] sysrq: Show Blocked State
[ 115.149200] task PC stack pid father
[ 115.149262] kworker/u8:1 D 0 114 2 0x00000028
[ 115.149289] Workqueue: events_power_efficient phy_state_machine
[ 115.149327] Call trace:
[ 115.149340] __switch_to+0xfc/0x150
[ 115.149352] __schedule+0x35c/0x7f0
[ 115.149360] schedule+0x80/0x118
[ 115.149370] schedule_hrtimeout_range_clock+0x90/0x120
[ 115.149378] schedule_hrtimeout_range+0x1c/0x28
[ 115.149386] usleep_range+0x8c/0xb8
[ 115.149397] stmmac_mdio_read+0x194/0x208
[ 115.149406] mdio_mux_read+0x78/0xa8
[ 115.149415] __mdiobus_read+0x50/0x268
[ 115.149423] mdiobus_read+0x58/0xe8
[ 115.149459] genphy_update_link+0x2c/0xc0
[ 115.149468] genphy_read_status+0x2c/0xf8
[ 115.149478] phy_check_link_status+0xe8/0xf8
[ 115.149487] phy_state_machine+0xa8/0x188
[ 115.149497] process_one_work+0x2b8/0x738
[ 115.149506] worker_thread+0x250/0x4a0
[ 115.149515] kthread+0x170/0x178
[ 115.149524] ret_from_fork+0x10/0x18
[ 115.149632] irq/42-0.200908 D 0 546 2 0x00000028
[ 115.149644] Call trace:
[ 115.149654] __switch_to+0xfc/0x150
[ 115.149663] __schedule+0x35c/0x7f0
[ 115.149718] __schedule+0x34c/0x7f0
[ 115.149728] schedule+0x80/0x118
[ 115.149736] schedule_hrtimeout_range_clock+0x90/0x120
[ 115.149744] schedule_hrtimeout_range+0x1c/0x28
[ 115.149752] usleep_range+0x8c/0xb8
[ 115.149762] stmmac_mdio_read+0x194/0x208
[ 115.149771] mdio_mux_read+0x78/0xa8
[ 115.149779] __mdiobus_read+0x50/0x268
[ 115.149789] phy_read_paged+0x44/0x68
[ 115.149799] rtl8211f_ack_interrupt+0x24/0x68
[ 115.149840] phy_interrupt+0xac/0xc0
[ 115.149851] irq_thread_fn+0x34/0xa8
[ 115.149860] irq_thread+0x1b0/0x280
[ 115.149868] kthread+0x170/0x178
[ 115.149877] ret_from_fork+0x10/0x18
Since the high CPU load was originated from interrupt, so let’s dive into the
interrupt handler to see what it did when an interrupt was triggered, that is
the gpio(29) is low, phy_trigger_machine
and phy_clear_interrupt will be called.
phy_clear_interrupt
will call drv->ack_interrupt
directly, which points to
rtl8211f_ack_interrupt
the relevant codes are in drivers/net/phy/realtek.c
:
static int rtl8211f_ack_interrupt(struct phy_device *phydev)
{
int err;
err = phy_read_paged(phydev, 0xa43, RTL8211F_INSR);
return (err < 0) ? err : 0;
}
It reads the value of interrupt status register RTL8211F_INSR
, if an interrupt
occurs, the corresponding bit will be set to 1, and the bit definition can be
found in section 8.4.19 INSR (Interrupt Status Register), the whole datasheet can
be found here:
Let’s see what kind of interrupts are they, with trace point mdio:mdio_access
we can dump the value returned by bus->read, which is issued by phy_read_paged
,
it returns zero every time the interrupt happens (see line of regnum=29), which
means there are no interrupts at all:
# argdist-bpfcc -C 't:mdio:mdio_access():char,unsigned,u16:args->read,args->regnum,args->val'
[07:35:17]
t:mdio:mdio_access():char,unsigned,u16:args->read,args->regnum,args->val
COUNT EVENT
1188 args->read = b'\x01', args->regnum = 0, args->val = 4160
1189 args->read = b'\x01', args->regnum = 1, args->val = 31149
1518 args->read = b'\x00', args->regnum = 31, args->val = 2626
1518 args->read = b'\x01', args->regnum = 29, args->val = 0
1519 args->read = b'\x01', args->regnum = 31, args->val = 2626
1519 args->read = b'\x00', args->regnum = 31, args->val = 2627
Now let’s figure out why fake interrupt leads to high CPU load, decode the stack trace reveals the guilty:
cat irq42.txt|./scripts/decode_stacktrace.sh vmlinux . .
[ 115.149762] stmmac_mdio_read (/opt/kernel/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c:225 (discriminator 12))
[ 115.149771] mdio_mux_read (/opt/kernel/drivers/net/phy/mdio-mux.c:51)
[ 115.149779] __mdiobus_read (/opt/kernel/drivers/net/phy/mdio_bus.c:794 (discriminator 3))
[ 115.149789] phy_read_paged (/opt/kernel/./include/linux/phy.h:742 /opt/kernel/drivers/net/phy/phy-core.c:818)
[ 115.149799] rtl8211f_ack_interrupt (/opt/kernel/drivers/net/phy/realtek.c:92)
[ 115.149840] phy_interrupt (/opt/kernel/drivers/net/phy/phy.c:732 /opt/kernel/drivers/net/phy/phy.c:717)
Both threads called stmmac_mdio_read
which involves readl_poll_timeout
, it’s
a busy loop with a sleep of 100us, considering the amount of fake interrupts, it
contributes to high CPU load.
It could be specifying wrong interrupt trigger type or a wrong GPIO, I tried to
change the trigger type to IRQ_TYPE_LEVEL_HIGH
, to no avail.
The quickest way to get this done is to decompile the dtb from android image to
see what’s the differences, I downloaded a modified ROM from YYFROM and
unpacked it with Linux AMLogic Toolkit, but I got incorrect
magic number when decompiling meson1.dtb
:
dtc -I dtb -O dts -o meson1.dts meson1.dtb
FATAL ERROR: Blob has incorrect magic number
Open with hex editor there is no D00DFEED
in it, and binwalk
returns nothing.
As there is no schematic for this deprecated box, it’s hard to confirm which GPIO it was used, searching around I found flippy has already fixed this issue and also made other improvements regarding ethernet performance.
This patch fixes the IRQ issue:
diff --git a/arch/arm64/boot/dts/amlogic/meson-gxl-s905d-phicomm-n1.dts b/arch/arm64/boot/dts/amlogic/meson-gxl-s905d-phicomm-n1
index b5667f1fb..05cbd3969 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxl-s905d-phicomm-n1.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxl-s905d-phicomm-n1.dts
@@ -29,3 +29,7 @@
&cvbs_vdac_port {
status = "disabled";
};
+
+&external_phy {
+ interrupts = <25 IRQ_TYPE_LEVEL_LOW>;
+};
There are some useful command I have been used on working this issue:
trace-cmd record -p function_graph -l rtl8211f_* -l phy_* -l __mdiobus_* -l mdio_* -l stmmac_mdio_* -l mutex_* -l mdio* -l usleep_* -l genphy*
trace-cmd record -p function_graph -l rtl8211f_* -l phy_state_machine -l phy_interrupt -l mutex_* -l usleep*
tplist-bpfcc -v mdio:mdio_access
funclatency-bpfcc rtl8211f_ack_interrupt -u -d 30
klockstat-bpfcc -d 30 -S hld_total,acq_total
argdist-bpfcc -C 'r::rtl8211f_ack_interrupt(struct phy_device *phydev):$ret'
Troubleshooting
Failed to mount boot partition
[ 29.974974] systemd[1]: Mounting /boot...
[ 30.034358] FAT-fs (sda1): IO charset ascii not found
[ 30.037106] systemd[1]: boot.mount: Mount process exited, code=exited, status=32/n/a
[ 30.037126] systemd[1]: boot.mount: Failed with result 'exit-code'.
[ 30.038740] systemd[1]: Failed to mount /boot.
CONFIG_NLS_ASCII
is required or insmod nls_ascii.ko
to avoid mount failure.
bpf: Failed to load program: Invalid argument
# klockstat-bpfcc
bpf: Failed to load program: Invalid argument
Traceback (most recent call last):
File "/usr/sbin/klockstat-bpfcc", line 390, in <module>
b.attach_kprobe(event="mutex_unlock", fn_name="mutex_unlock_enter")
File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 654, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 394, in load_func
raise Exception("Failed to load BPF program %s: %s" %
Exception: Failed to load BPF program b'mutex_unlock_enter': Invalid argument
In order to use kprobe, these options should be enabled in kernel configuration:
+CONFIG_KPROBES=y
+CONFIG_KPROBE_EVENTS=y
+CONFIG_BPF_EVENTS=y
+CONFIG_FTRACE=y
Cannot boot big kernel larger than 32 MB
I am using Armbian build by 150balbes, the maximum allow kernel is 32 MB, the kernel was booted by U-Boot, kernel and ramdisk load address are set in s905_autoscript.cmd:
setenv kernel_addr_r 0x11000000
setenv ramdisk_addr_r 0x13000000
So if we load big kernel, the kernel image contents higher than 0x13000000 will
be overwritten, change ramdisk_addr_r
to a bigger value (say: 0x15000000) fix
this issue.
Regenerate autoscript with command:
mkimage -A arm -O linux -T script -C none \
-n "boot script" -d ./s905_autoscript.cmd s905_autoscript
If you want boot kernel from internal emmc, then change the value of initrd_addr
in emmc_autoscript.cmd
:
setenv initrd_addr 0x15000000
Decompiling dtb failed with ‘FATAL ERROR: Blob has incorrect magic number’
I have no idea about this for now.