If you prowl web sites that discuss embedded hardware and operating systems, you will occasionally find people complaining that Linux is not a “real-time operating system”. What do they mean by that? Why is it important?
Why do we care?
It just so happens that my silly little art project illustrates the issue quite nicely – and very obviously.
The Problem
I’m quite sure that nobody else has yet built one of these little display toys. Why would they? Especially when the prices on those particular displays have skyrocketted, leaving the cost to build such a thing in the neighborhood of $40 before you even add the unobtanium that is the Raspberry Pi to control them!
But I digress.
If you had built one, you would soon realize that it’s far from perfect. You run the application, send it a message, and it appears on the display – but it flickers something fierce. It’s clearly got issues. It doesn’t look right!
This is a modern ARM processor, and while it’s by no means the fastest thing out there, it’s certainly fast enough for this application. So why is this happening?
There’s a very simple answer…
Whimsical Scheduling
Even on a fresh Linux install there are quite a few processes running. You have various daemons that are managing the system, and let’s face it: systemd tends to sprawl all over the place and has a bunch of processes all its own. Each of these things needs processor time, and Linux has to schedule them.
Every time one of these other processes gets scheduled, your nifty little app has a very good chance of getting pre-empted, its time slice stolen so the other process can run. Or if your process is asleep, it can delay it from waking in a timely fashion.
That’s what’s happening with the display toy. Its timeslice is getting stolen or delayed at more or less random intervals, and that prevents it from twiddling the GPIO bits consistently enough for a steady display. The result is an intermittent, random, and very obvious flicker.
It’s surprising that it’s visible, but here we are.
In an ideal world, we would be able to tell the OS that our process needs to run at specific times, and that said times are non-negotiable. In simplistic terms, that is what a real-time operating system allows: the ability to guarantee that your process will run at particular times, and for long enough to do its job. Ordinary Linux will not do that for you.
One would think that giving the process a higher priority – say, with
the nice
command – would fix the problem well enough, but no. It
doesn’t actually help in any appreciable manner. Even with our toy
set at the highest available priority, the problem persists.
And that is why embedded developers sometimes complain that Linux is not real-time.
There are real-time variations of Linux, but dealing with that is out of scope of this particular post. Maybe I’ll look into it at some point, just out of curiosity…
So how do we fix it without replacing the stock Raspbian Linux kernel?
Read on…
Attempt #1
This issue, needless to say, annoyed the heck out of me when I noticed it. Why should a Raspberry Pi not be able to keep up? I considered a number of different solutions, but nothing seemed adequate. Unless I wanted to add a dedicated microcontroller to my silly little toy, this was going to keep happening.
Unless I could make Linux not mess with my process.
As it happens, the Raspberry Pi 3 Model B (v1.2) that I’m using has four CPU cores. What if we just pin this thing to one of them and let it do its thing? That should help, right? There’s even a Linux utility that lets you do that!
Okay, so I cancel my run and restart it, pinned to a CPU:
steve@boredom:~# taskset -c 3 ./boredom
2023/03/25 18:49:02 Boredom v0.1.1 starting.
2023/03/25 18:49:02 rabbit: connecting to: amqp://cortex:xxx@rabbitmq.us-east.floating.io/
2023/03/25 18:49:02 rabbit: declared exchange 'fio'.
2023/03/25 18:49:02 rabbit: Queue [fio.boredom.messages] has 0 messages with 0 consumers.
2023/03/25 18:49:03 rabbit: Queue [fio.boredom.messages.dead] has 0 messages with 0 consumers.
And now my little app is pinned to CPU #3.
Guess what?
It might have helped a little, but not all that much. Subjectively, it does seem a little less flickery – but… it’s still flickering. Still not acceptable!
But wait. Just because I pinned it to a CPU does not mean that it won’t get preempted or delayed by some other process. It just means that it will only ever run on that one CPU. Well, with so many background processes, that’s not quite enough, is it? It’s a wonder that it didn’t actually make things worse!
The Real Fix
In order to truly make things predictable, we need to ensure that nothing else can run on the CPU that our toy is pinned to. There are a number of options here, but by far the simplest is just to tell Linux that it’s not allowed to automatically schedule anything on that particular core. This is actually very easy to do.
On a Raspberry Pi running Raspbian, simply edit /boot/cmdline.txt
and add the appropriate option (without changing anything else!).
For example, after the update, mine looks like this:
console=tty1 root=PARTUUID=657dbd5b-02 rootfstype=ext4 fsck.repair=yes rootwait isolcpus=3
These are the “command-line arguments” to the Linux kernel, read at
startup. The key is that last parameter: isolcpus=3
. That
tells Linux that it should never schedule any processes on CPU #3
unless the administrator explicitly requests it (via taskset
or
similar functionality). If you want to do multiple CPUs, you can do
isolcpus=2-3
or the like. Note that CPU numbers are zero-indexed,
so the first CPU is CPU 0. Check the appropriate
documentation
to get the full story.
Add that in, reboot, and re-run the command from above, pinning our toy to CPU #3, which has now been isolated…
Guess what? It’s solid as a rock. No more flicker.
Fancy that…
The Story Of systemd
I’m not a huge fan of systemd – I find systemctl
and journalctl
to be verbose, obtuse, and obnoxious – but if there’s one thing it
does well, it’s service management. I honestly wish that they’d
simply built a runner that works based on these service files, and
left it at that.
It would have been far less controversial in all likelihood, but I digress.
I bring it up here because if we want to make our little toy work
well, we want to make it a service that systemd can control.
Thankfully, this is easy to do. And, since systemd just so happens to
include everything and the kitchen sink, we can even manage our CPU
pinning without resorting to taskset
!
And so, in /usr/lib/systemd/system/boredom.service
, we now have:
[Unit]
Description=Boredom in 14 Segments
After=network.target
[Service]
EnvironmentFile=-/etc/default/boredom
ExecStart=/usr/bin/boredom
Restart=on-failure
CPUAffinity=3
[Install]
WantedBy=multi-user.target
That highlighted line is the key. See the systemd documentation for
more information on how the CPUAffinity
option works. I may not care
for systemd (I touch type, and the command names just do not
play well with me for whatever reason), but one cannot say that it
doesn’t provide the necessary features for most use cases.
In the mean time, here we are. A nicely isolated CPU, with our silly little art app pinned to it. It’s behaving perfectly. There’s no flicker. The display is rock solid now.
And systemd has even assimilated it.
The Moral
The moral of the story? OS interactions can be far more complex than you think, and far more is going on under the hood than you expect. In time-sensitive applications, that can truly matter. Granted, this is a silly application that really shouldn’t be as time-sensitive as it is; there are probably far better ways to write it.
But it illustrates the point rather nicely, don’t you think?
And hopefully this little trick will give you a way to avoid that issue in the simpler cases in the future. At some point I might try going the RTLinux route just as a learning experience, but for now this will work just fine.