Setting IRQ CPU affinities: Improving IRQ performance on the ODROID-XU4

I recently came across a post on the ODROID subreddit which featured an article offering tweaking tips for the ODROID-XU4. The article was originally written in German and was later translated into English and published in ODROID Magazine. As a long time owner of an ODROID-XU4, most of the tips were not new to me since they’ve existed on the ODROID forums for quite some time now. However, there was this one tip I was not aware of and it caught my attention, and not in a good way.

IRQs

IRQs (Interrupt Requests) allow the hardware to access the CPU even when it’s busy doing something else. So our keyboards, mice, and networking, for example, won’t stop working if we’re maxing out our CPU.

Anyone who has used computers for enough time knows this phenomenon where the mouse and keyboard stutter, lag or become unresponsive for some time when the CPU is doing an intensive task. This was way more common on early computers and has become less common as CPUs have become more powerful, operating systems have evolved, and APIC architecture was introduced.

To get the absolute best performance out of hardware peripherals on a multi-core system we need to make sure we’re addressing IRQs to the most idle core, increasing the chances they’re going to be executed immediately. On systems with Arm big.LITTLE chipsets (such as the ODROID-XU4) we’re more likely to get the best responsiveness for IRQs out of the “big” cores. This makes perfect sense.

IRQs on Linux

To get a list of IRQs and their CPU affinities we can simply peak inside /proc/interrupts. This is how it looks on my ODROID-XU4 running Arch Linux ARM with kernel v4.14.157: Note: The output is quite long so I’ll use head to trim it.

$ cat /proc/interrupts | head
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
49: 0 0 0 0 0 0 0 0 COMBINER 187 Edge mct_comp_irq
50: 8344372 0 0 0 0 0 0 0 GICv2 152 Edge mct_tick0
51: 0 5765406 0 0 0 0 0 0 GICv2 153 Edge mct_tick1
52: 0 0 4389485 0 0 0 0 0 GICv2 154 Edge mct_tick2
53: 0 0 0 3384898 0 0 0 0 GICv2 155 Edge mct_tick3
54: 0 0 0 0 55211190 0 0 0 GICv2 160 Edge mct_tick4
55: 0 0 0 0 0 48058391 0 0 GICv2 161 Edge mct_tick5
56: 0 0 0 0 0 0 33449904 0 GICv2 162 Edge mct_tick6
57: 0 0 0 0 0 0 0 20020736 GICv2 163 Edge mct_tick7
In the above output we have IRQs #49-57 which seem like the system clock ticks. One for each of the 8 cores. Basically, each IRQ has its own ID and is bound to a single CPU.

The last statement may be hard to understand from the last example, so let’s take a look at how MMC and SD-Card reader interrupts look:

Note: I know that dw-mci are interrupts for the I/O devices simply from looking at the source code (https://github.com/hardkernel/linux/blob/odroidxu4-4.14.y/drivers/mmc/host/dw_mmc-exynos.c).

$ cat /proc/interrupts | grep dw-mci
83: 0 0 0 0 0 0 0 0 GICv2 107 Edge dw-mci
84: 103693202 0 0 0 0 0 0 0 GICv2 109 Edge dw-mci
IRQ #83 is for eMMC, which I don’t have, and IRQ #84 is for the MicroSD card which is obviously installed.

By default, all non-CPU-clock IRQs are bound to all cores but in reality CPU0 will be used most of the time since it’s simply the first one. In my kernel version on the ODROID-XU4, CPU0 is one of the “little” cores. The easiest way to confirm that is by looking at the max CPU frequency of each core, since the “little” ones are running at a slower speed:

$ cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
1500000
1500000
1500000
1500000
2000000
2000000
2000000
2000000
First 4 CPUs (=cores) are running at 1.5GHz and the last ones at 2GHz, which matches the ODROID-XU4’s Samsung Exynos5422 CPU speeds, my “big” cores are running 100MHz slower.

Changing IRQ CPU Affinity

It’s quite easy to change the CPU affinity of IRQs. All IRQs are listed in /proc/irq/ and each one’s affinity is conveniently written inside smp_affinity and smp_affinity_list with the former containing a hexadecimal value and the latter a decimal value.

So, to change our MicroSD card’s IRQ CPU affinity all we have to do it change the value of /proc/irq/84/smp_affinity_list to whatever CPU number we’d like, for example, 5. Of course, we cannot do that as a normal user so we’ll have to use sudo. The easiest way to do that is as follows:

$ sudo sh -c "echo 5 > /proc/irq/84/smp_affinity_list"
And we can confirm that it worked:
$ cat /proc/irq/84/smp_affinity_list
5
$ cat /proc/interrupts | grep dw-mci
83: 0 0 0 0 0 0 0 0 GICv2 107 Edge dw-mci
84: 103699631 0 0 0 152288 0 0 0 GICv2 109 Edge dw-mci
Note: This value will not stick after boot, but this is the general idea.

The “tweaks”

Going back to where we started, the article suggested doing exactly what I wrote above, so why was I unsatisfied with it? The article’s usage of irqbalance. Putting aside the poor choice of using /etc/rc.local for applying this tweak, the first step was the one that caught my attention the most:

$ systemctl disable irqbalance
There’s a dedicated program that its whole purpose is doing IRQ balancing and we’re going ahead and disabling it? Sounds extremely fishy.

What I immediately thought was that maybe irqbalance did not allow limiting its assignments to specific CPUs, and therefore disabling it would make sense. However that was not the case. Looking at the man page of irqbalance, there’s an environment variable:

IRQBALANCE_BANNED_CPUS

which can tell the program to avoid assigning IRQs to those CPUs; which is exactly what we want.

The value of this environment variable is a hexadecimal mask. We simply need to say which CPUs we want active and which we don’t. Each CPU is either on or off (=1 or 0) and in our case we want to turn off the first four and leave the last ones on. That means our mask in binary would be:

00001111
The value must be hexadecimal, which is a fairly easy conversion from binary in this case: 0F.

All we have to do now is to set our environment variable to 0F (or just F since the leading 0 has no meaning). Let’s test that to make sure we’ve gotten the math right:

$ sudo su
$ export IRQBALANCE_BANNED_CPUS="f"
$ irqbalance -d
This machine seems not NUMA capable.
Isolated CPUs: 00000000
Adaptive-ticks CPUs: 00000000
Banned CPUs: 0000000f
...
Package 0: numa_node -1 cpu mask is 000000f0 (load 520000000)
Cache domain 0: numa_node is -1 cpu mask is 00000080 (load 90000000)
CPU number 7 numa_node is -1 (load 90000000)
Cache domain 1: numa_node is -1 cpu mask is 00000020 (load 100000000)
CPU number 5 numa_node is -1 (load 100000000)
Cache domain 2: numa_node is -1 cpu mask is 00000040 (load 130000000)
CPU number 6 numa_node is -1 (load 130000000)
Cache domain 3: numa_node is -1 cpu mask is 00000010 (load 200000000)
CPU number 4 numa_node is -1 (load 200000000)

  • First we change user to root to make it easier for us.
  • Then export the environment variable for irqbalance.
  • Run irqbalance in debug -d mode.
  • Output: The first part shows in Banned CPUs that our value was accepted. Then, if we look at the rest of the output we can spot it assigning stuff to CPUs #4-7(5th to 8th), which is exactly what we wanted.

To set this environment variable so the systemd unit will be able to access it, we need to inspect it:

$ systemctl show irqbalance
There we look for EnvironmentFile value which could be anything depending on the operating system. On Ubuntu 18.04, it’s /etc/default/irqbalance and on my Arch system it’s /etc/irqbalance.env. There’s probably already a template file there and all we have to do is make sure it’s uncommented and set with the right value.

Using a fixed IRQ IDs

The “tip” instructs putting each line that corresponds to a different hardware controller’s IRQ ID. However, IRQ IDs are not consistent and depend on the kernel version. For example, on my system IRQs IDs #103-105 map to some gpio pins:

$ cat /proc/interrupts | awk '$1 ~ /103|104|105/'
103: 0 0 0 0 0 0 0 0 GICv2 110 Edge 13410000.pinctrl
104: 0 0 0 0 0 0 0 0 GICv2 78 Edge 14000000.pinctrl
105: 0 0 0 0 0 0 0 0 GICv2 82 Edge 14010000.pinctrl

Binding each interrupt to a single CPU

Last but not least, the article suggests binding each interrupt to a different CPU. Networking card gets CPU4, USB3 adapter gets CPU5, etc. Why bother limiting the CPUs that our kernel can choose? What if a program locks that specific CPU for a long period of time? The kernel wouldn’t be able to assign the IRQ to a different CPU to avoid slowdowns.

Afterword

While I like those tweak compilations as much as the next guy, I always tend to make sure I completely understand what each tweak is doing and double check them to see how they apply to my particular system and use case. Moreover, if there exists a dedicated tool for a certain purpose (like irqbalance for this matter), one should first consider using it, otherwise its existence wouldn’t be justified.

This article’s purpose is by no means to offend or condemn u/blaumedia who wrote the original article, it is meant to raise awareness to why users must consider their situation and understand what they’re doing. For more information please see the original article post at https://my-take-on.tech/2020/01/12/setting-irq-cpu-affinities-to-improve-performance-on-the-odroid-xu4/.

Be the first to comment

Leave a Reply