Limiting resource usage with cgroups on Android
cgroups on Android
cgroups(control groups for short) was created by Google engineers, and was merged into the mainline at 2008 in version 2.6.24, it was mainly used as resource management in linux containers, along with linux kernel namespaces.
To see what if your kernel support the cgroup, see output of cat
/proc/cgroups
:
rpi3:/ # cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 3 5 1
cpu 2 1 1
cpuacct 1 103 1
blkio 0 1 1
memory 0 1 0
devices 0 1 1
freezer 0 1 1
net_cls 0 1 1
From last column of the above output, we see the kernel supports cpuset, cpu, cpuacct, blkio, memory, devices, freezer, net_cls. The memory controller was disabled by default.
There is kernel message if cgroup memory was disabled:
[ 0.001711] Disabling memory control group subsystem
Enabling memory control will add 8 bytes of accounting memory per 4K page in
32 bit system, so they make it disabled by default, enable it by appending
cgroup_enable=memory
to the kernel cmdline in cmdline.txt.
After do a reboot, now memory controller was enabled:
rpi3:/ # cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 4 5 1
cpu 3 2 1
cpuacct 1 2 1
blkio 0 1 1
memory 2 110 1
devices 0 1 1
freezer 0 1 1
net_cls 0 1 1
On my system which running lineageos 15.1, only cpuset, cpu, cpuacct and
memory were used and these were mounted in init.rc, cpu accounting and memory
controls were mounted in early-init
stage:
on early-init
# Mount cgroup mount point for cpu accounting
mount cgroup none /acct cpuacct
mkdir /acct/uid
# root memory control cgroup, used by lmkd
mkdir /dev/memcg 0700 root system
mount cgroup none /dev/memcg memory
# app mem cgroups, used by activity manager, lmkd and zygote
mkdir /dev/memcg/apps/ 0755 system system
# cgroup for system_server and surfaceflinger
mkdir /dev/memcg/system 0550 system system
And cpu, cpuset were mounted in init
stage:
on init
# Create cgroup mount points for process groups
mkdir /dev/cpuctl
mount cgroup none /dev/cpuctl cpu
chown system system /dev/cpuctl
chown system system /dev/cpuctl/tasks
chmod 0666 /dev/cpuctl/tasks
write /dev/cpuctl/cpu.rt_period_us 1000000
write /dev/cpuctl/cpu.rt_runtime_us 950000
# sets up initial cpusets for ActivityManager
mkdir /dev/cpuset
mount cpuset none /dev/cpuset
# this ensures that the cpusets are present and usable, but the device's
# init.rc must actually set the correct cpus
mkdir /dev/cpuset/foreground
copy /dev/cpuset/cpus /dev/cpuset/foreground/cpus
copy /dev/cpuset/mems /dev/cpuset/foreground/mems
mkdir /dev/cpuset/background
copy /dev/cpuset/cpus /dev/cpuset/background/cpus
copy /dev/cpuset/mems /dev/cpuset/background/mems
# system-background is for system tasks that should only run on
# little cores, not on bigs
# to be used only by init, so don't change system-bg permissions
mkdir /dev/cpuset/system-background
copy /dev/cpuset/cpus /dev/cpuset/system-background/cpus
copy /dev/cpuset/mems /dev/cpuset/system-background/mems
mkdir /dev/cpuset/top-app
copy /dev/cpuset/cpus /dev/cpuset/top-app/cpus
copy /dev/cpuset/mems /dev/cpuset/top-app/mems
# change permissions for all cpusets we'll touch at runtime
chown system system /dev/cpuset
chown system system /dev/cpuset/foreground
chown system system /dev/cpuset/background
chown system system /dev/cpuset/system-background
chown system system /dev/cpuset/top-app
chown system system /dev/cpuset/tasks
chown system system /dev/cpuset/foreground/tasks
chown system system /dev/cpuset/background/tasks
chown system system /dev/cpuset/system-background/tasks
chown system system /dev/cpuset/top-app/tasks
# set system-background to 0775 so SurfaceFlinger can touch it
chmod 0775 /dev/cpuset/system-background
chmod 0664 /dev/cpuset/foreground/tasks
chmod 0664 /dev/cpuset/background/tasks
chmod 0664 /dev/cpuset/system-background/tasks
chmod 0664 /dev/cpuset/top-app/tasks
chmod 0664 /dev/cpuset/tasks
After android boot completed, all four cgroup controllers were mounted properly:
rpi3:/ # mount | grep cgroup
none on /acct type cgroup (rw,relatime,cpuacct)
none on /dev/memcg type cgroup (rw,relatime,memory)
none on /dev/cpuctl type cgroup (rw,relatime,cpu)
none on /dev/cpuset type cgroup (rw,relatime,cpuset,noprefix,release_agent=/sbin/cpuset_release_agent)
Limiting process to cpuset
Stress-ng was born to make stress test on linux systems with over 220 stress tests covering CPU, memory etc. This can be used for demonstration in this post, first grab source code from github, then build with:
$ git clone --depth=1 https://github.com/ColinIanKing/stress-ng.git
$ export CC=arm-linux-gnueabihf-gcc
$ STATIC=1 make ARCH=arm
push stress-ng to target device, then do test, run trace-cmd to record sched in the meanwhile:
# stress-ng -m 1 -c 4
# trace-cmd record -e sched
trace-cmd report
shows that all the stress processes are distributed to all the
available CPUs:
stress-ng-cpu-1478 [003] 3118.999753: sched_load_cfs_rq: cpu=3 path=/autogroup-2 load=1024 util=983 util_pelt=983 util_walt=0
stress-ng-cpu-1477 [001] 3118.999753: sched_load_cfs_rq: cpu=1 path=/autogroup-2 load=1024 util=913 util_pelt=913 util_walt=0
stress-ng-vm-1481 [000] 3118.999755: sched_load_cfs_rq: cpu=0 path=/autogroup-2 load=1024 util=819 util_pelt=819 util_walt=0
stress-ng-cpu-1479 [002] 3118.999755: sched_load_cfs_rq: cpu=2 path=/autogroup-2 load=2049 util=1215 util_pelt=1215 util_walt=0
stress-ng-cpu-1477 [001] 3118.999757: sched_load_tg: cpu=1 path=/autogroup-2 load=5093
stress-ng-vm-1481 [000] 3118.999759: sched_load_tg: cpu=0 path=/autogroup-2 load=5093
stress-ng-cpu-1478 [003] 3118.999759: sched_load_tg: cpu=3 path=/autogroup-2 load=5093
stress-ng-cpu-1479 [002] 3118.999760: sched_load_tg: cpu=2 path=/autogroup-2 load=5093
Then echo these pids to /dev/cpuset/restricted/cgroup.procs
, and set restricted
group cpus to 2,3:
echo 2,3 > /dev/cpuset/restricted/cpus
Now all stress processes are running in restricted group, and only cpu2 and cpu3 are used:
stress-ng-cpu-1478 [003] 3974.519769: sched_load_cfs_rq: cpu=3 path=/autogroup-2 load=2049 util=1024 util_pelt=1024 util_walt=0
stress-ng-cpu-1479 [002] 3974.519770: sched_load_cfs_rq: cpu=2 path=/autogroup-2 load=3073 util=1024 util_pelt=1024 util_walt=0
stress-ng-cpu-1478 [003] 3974.519774: sched_load_tg: cpu=3 path=/autogroup-2 load=5121
stress-ng-cpu-1479 [002] 3974.519774: sched_load_tg: cpu=2 path=/autogroup-2 load=5121
stress-ng-cpu-1479 [002] 3974.519780: sched_load_se: cpu=2 path=/autogroup-2 comm=(null) pid=-1 load=614 util=1024 util_pelt=1024 util_walt=0
stress-ng-cpu-1478 [003] 3974.519780: sched_load_se: cpu=3 path=/autogroup-2 comm=(null) pid=-1 load=409 util=1024 util_pelt=1024 util_walt=0
stress-ng-cpu-1479 [002] 3974.519784: sched_load_cfs_rq: cpu=2 path=/ load=614 util=1024 util_pelt=1024 util_walt=0
stress-ng-cpu-1478 [003] 3974.519784: sched_load_cfs_rq: cpu=3 path=/ load=745 util=1540 util_pelt=1540 util_walt=0
Limiting memory usage
For demonstrating limit memory usage with cgroup memory controller, run memory test with:
stress-ng -m 1 --vm-bytes 256M -M
Since we have enough memory, stress will be running without issue, all the stress processes are running without memory limit.
In this test we will use the app group to show how to set memory limit in cgroup, and what will happen when process reach that limit, set the limit to 20M and move test processes to this group.
echo 20M > /dev/memcg/apps/memory.limit_in_bytes
echo 1602 > /dev/memcg/apps/tasks
After moving stress process to app group, OOM was triggered immediately:
[ 1856.169324] stress-ng-vm invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=1000
[ 1856.180617] stress-ng-vm cpuset=/ mems_allowed=0
[ 1856.185372] CPU: 0 PID: 1602 Comm: stress-ng-vm Not tainted 4.14.135-v8+ #8
[ 1856.192440] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
[ 1856.198801] Call trace:
[ 1856.201297] [<ffffff99d548b608>] dump_backtrace+0x0/0x270
[ 1856.206781] [<ffffff99d548b89c>] show_stack+0x24/0x30
[ 1856.211913] [<ffffff99d5d2ddf0>] dump_stack+0xac/0xe4
[ 1856.217045] [<ffffff99d55eb1fc>] dump_header+0x94/0x1e8
[ 1856.222352] [<ffffff99d55ea2e0>] oom_kill_process+0x2c8/0x5d0
[ 1856.228187] [<ffffff99d55eaf24>] out_of_memory+0x104/0x2d0
[ 1856.233759] [<ffffff99d564f260>] mem_cgroup_out_of_memory+0x50/0x70
[ 1856.240124] [<ffffff99d565539c>] mem_cgroup_oom_synchronize+0x35c/0x3b8
[ 1856.246842] [<ffffff99d55eb118>] pagefault_out_of_memory+0x28/0x78
[ 1856.253119] [<ffffff99d5d48de8>] do_page_fault+0x440/0x450
[ 1856.258689] [<ffffff99d5d48e64>] do_translation_fault+0x6c/0x7c
[ 1856.264699] [<ffffff99d54814a0>] do_mem_abort+0x50/0xb0
[ 1856.270004] Exception stack(0xffffff800b64bec0 to 0xffffff800b64c000)
[ 1856.276545] bec0: 0000000000000000 0000000010000000 00000000ff5a00a5 00000000f1a26000
[ 1856.284494] bee0: 00000000000fdafc 00000000e2d96000 00000000e1a26000 0000000000154550
[ 1856.292444] bf00: 00000000ffcf1840 0000000004000000 00000000e1a26000 0000000000000000
[ 1856.300394] bf20: 00000000000000c0 00000000ffcf1760 0000000000068c1f 0000000000000000
[ 1856.308342] bf40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 1856.316291] bf60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 1856.324241] bf80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 1856.332191] bfa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 1856.340141] bfc0: 0000000000063946 00000000200f0030 00000000e1a26000 00000000ffffffff
[ 1856.348091] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 1856.356041] [<ffffff99d548366c>] el0_da+0x20/0x24
[ 1856.361022] Task in /apps killed as a result of limit of /apps
[ 1856.367061] memory: usage 20480kB, limit 20480kB, failcnt 8911
[ 1856.373016] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[ 1856.379826] kmem: usage 572kB, limit 9007199254740988kB, failcnt 0
[ 1856.386167] Memory cgroup stats for /apps: cache:0KB rss:19908KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:19884KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 1856.405779] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1856.414845] [ 1602] 0 1602 69276 5146 18 2 0 1000 stress-ng-vm
[ 1856.424287] Memory cgroup out of memory: Kill process 1602 (stress-ng-vm) score 1955 or sacrifice child
[ 1856.433880] Killed process 1602 (stress-ng-vm) total-vm:277104kB, anon-rss:19892kB, file-rss:684kB, shmem-rss:8kB
[ 1856.462531] oom_reaper: reaped process 1602 (stress-ng-vm), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB
Freezing process
freezer controller was not mounted in init.rc, mount it with:
mkdir /dev/freezer
mount -t cgroup none /dev/freezer -o freezer
This section we use the same test as in memory section, before freezing, we can see the stress process takes about 100% CPU usage, and after put it into freezer group, it no longer takes that much of CPU usage, freeze it with:
mkdir /dev/freezer/test
echo 1707 > /dev/freezer/test/tasks
echo FROZEN > /dev/freezer/test/freezer.state
To unfreeze process:
echo THAWED > /dev/freezer/test/freezer.state