This article explores how one can implement Duff's-Device-styled coroutine support as an alternative programming style for "doing many things at once" in Arduino programs. Here we will use Arduino IDE's approach for structuring Arduino sketches, i.e. a programs implemented in terms of setup() and loop() functions.
The general description of this technique can be found in the well-known Simon Tatham article. The article also contains a link to a full-scale implementation of macro-based coroutine support, which might be a bit too heavy for Arduino applications since it actively uses dynamic memory allocation. Here a slightly different approach is used, which does not rely on dynamic memory.
Without coroutines, in order to "do several things at once" we'd normally have to spend some effort to redesign/adjust the algorithms (and their implementation) to make them compatible with incremental evaluation: one small piece of progress after another, with potentially long periods of dormancy in between. This allows us to "interleave" pieces of different routines and thus create the effect of executing multiple different routines in parallel. But the resultant code is often quite convoluted and much less readable. The more complex the algorithm, the less recognizable its implementation becomes once transformed and prepared for such incremental execution. Sometimes it feels like the algorithm has to be turned inside-out, making even the most straightforward algorithms hard to read. By contrast, the immediate attraction of coroutine approach when writing cooperatively multitasked code is that it allows one to keep the natural code structure completely intact. One just need to pepper the original code with strategically placed control switch (yield) instructions.
Step 1: forget about multitasking for now
So, let's begin with some basic code. Let's say that we want our program to cycle through three LEDs, lighting up one of them for a set period of time. And let's say we decided to implement it in the most straightforward and literal way imaginable. We just forget about any multitasking considerations and explicitly state what we want to happen. Here's the sketch
void running_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { for (unsigned i_pin = 0; i_pin < n_pins; ++i_pin) { digitalWrite(pins[i_pin], HIGH); delay(duration); digitalWrite(pins[i_pin], LOW); } } const unsigned PINS[] = { 6, 7, 8 }; const unsigned N_PINS = sizeof PINS / sizeof *PINS; void setup() { for (unsigned i_pin : PINS) pinMode(i_pin, OUTPUT); } void loop() { running_led(PINS, N_PINS, 500); }
The above running_led() function does what we want it to do: it cycles through LEDs, lighting one up, holding a pause with delay(), turning the LED off and proceeding to the next one. Obviously, writing functions like that is not a good Arduino programming practice, since this function is a blocking one. It is blocking on several levels. Firstly, it does not return until it cycles through all LEDs. Secondly, it uses delay() to keep each LED lit up for the specified amount of time. However, this function is good enough when we don't have anything else to do besides cycling through our LEDs.
One might even say that we could've made it even more "blocking": since we want our LEDs to loop forever and since we don't have anything else to do, we could've made the running_led() function infinitely blocking by implementing it as an infinite cycle
void running_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { for (;;) for (unsigned i_pin = 0; i_pin < n_pins; ++i_pin) { digitalWrite(pins[i_pin], HIGH); delay(duration); digitalWrite(pins[i_pin], LOW); } }
One might argue that this expresses out intent even better. But let's not do it, at least yet. One thing is writing blocking functions, another - writing infinitely blocking functions. The latter is a very bad idea. Even if you have nothing else to do in your sketch, please do make sure that your loop() function returns control to its caller at least once in a while.
Step 2: we actually do want to multitask
Three LEDs are nice, but not exactly satisfying. We have four more, as can be seen in the animated GIF above. And let's say we want these four yellow LEDs to light up in random order, one LED at a time, each one for a set duration. And we want this to happen simultaneously with the three original red LEDs walking through their cycle.
We just write another function that implements the desired randomized behavior, following pretty much the same general approach as before. And just for the fun of it we simply call it from loop()
void running_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { for (unsigned i_pin = 0; i_pin < n_pins; ++i_pin) { digitalWrite(pins[i_pin], HIGH); delay(duration); digitalWrite(pins[i_pin], LOW); } } void random_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { for (unsigned n_times = n_pins; n_times > 0; --n_times) { unsigned i_pin = rand() % n_pins; digitalWrite(pins[i_pin], HIGH); delay(duration); digitalWrite(pins[i_pin], LOW); } } const unsigned PINS1[] = { 6, 7, 8 }; const unsigned N_PINS1 = sizeof PINS1 / sizeof *PINS1; const unsigned PINS2[] = { 2, 3, 4, 5 }; const unsigned N_PINS2 = sizeof PINS2 / sizeof *PINS2; void setup() { for (unsigned i_pin : PINS1) pinMode(i_pin, OUTPUT); for (unsigned i_pin : PINS2) pinMode(i_pin, OUTPUT); } void loop() { running_led(PINS1, N_PINS1, 500); random_led(PINS2, N_PINS2, 800); }
This is, of course, a hopelessly naive attempt at multitasking. The sketch will run, but the LEDs' behavior will not be simultaneous. At least it won't be as simultaneous as we originally wanted it to be. The two LED groups will operate in alternating fashion: the red LEDs will do one pass of their cycling routine, then the yellow ones will execute one pass of their randomized session, then the red ones again... and so on.
Mesmerizing, but not what we wanted.
Step 3: basic coroutines
And now is the time for our first, most simplistic attempt to use couroutines to achieve our objective. The following three macros are sufficient to make a basic implementation of switch/case couroutines
#define CO_BEGIN static unsigned state = 0; switch (state) { case 0: #define CO_YIELD do { state = __LINE__; return; case __LINE__:; } while (0) #define CO_END } state = 0
(I'll immediately note that the above macros are implemented with C compatibility in mind. Even though the code examples in this article are written in C++ and intended to be compiled as Arduino IDE .ino sketches, I will strive to maintain C compatibility in all infrastructural elements associated with coroutine functionality.)
The idea is that every time we invoke CO_YIELD macro inside some function A, the function is interrupted and the control is transferred to another routine B. Once that other routine B makes its own invocation of CO_YIELD, the control is handed over to yet another routine C and so on, until it makes full round and returns back to A. When control returns back to A, it picks up where it left off: at the point right after the aforementioned invocation of CO_YIELD. This is achieved with some clever switch/case trickery, which we will not dive into here (see [1] for more details).
However, while the above macros take care of the control transfer, they do nothing about preserving the rest of the function's state between the moment control departs from the function and the moment it returns back. This is our responsibility to maintain the state, to save every piece of state that needs to be saved by using static variables or by some other means.
So, let's get back to our code. We want to insert invocations of CO_YIELD into our functions so that they'll split the control flow into fairly short sequential segments. The first and the most obvious idea that immediately comes to mind is to simply invoke CO_YIELD at the end of each iteration of the LED cycles: after processing of each individual LED. Here's what the code looks like with our CO_... macros added to running_led() and random_led() functions
void running_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { CO_BEGIN; static unsigned i_pin; for (i_pin = 0; i_pin < n_pins; ++i_pin) { digitalWrite(pins[i_pin], HIGH); delay(duration); digitalWrite(pins[i_pin], LOW); CO_YIELD; } CO_END; } void random_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { CO_BEGIN; static unsigned n_times; for (n_times = n_pins; n_times > 0; --n_times) { unsigned i_pin; i_pin = rand() % n_pins; digitalWrite(pins[i_pin], HIGH); delay(duration); digitalWrite(pins[i_pin], LOW); CO_YIELD; } CO_END; } const unsigned PINS1[] = { 6, 7, 8 }; const unsigned N_PINS1 = sizeof PINS1 / sizeof *PINS1; const unsigned PINS2[] = { 2, 3, 4, 5 }; const unsigned N_PINS2 = sizeof PINS2 / sizeof *PINS2; void setup() { for (unsigned i_pin : PINS1) pinMode(i_pin, OUTPUT); for (unsigned i_pin : PINS2) pinMode(i_pin, OUTPUT); } void loop() { running_led(PINS1, N_PINS1, 500); random_led(PINS2, N_PINS2, 800); }
Note some important details:
- Static variables are strategically used to preserve cycle iterators/counters across CO_YIELD boundaries. These values have to survive as we depart into CO_YIELD and then re-emerge from it.
- We make no attempts to save the parameter values. Depending on the design, this might or might not be necessary. In our simple artificial example this is not necessary since the function is always called with the same set of arguments.
- Internal static variables have to be re-initialized every time the function starts from the beginning, which means that we have to separate static variable declarations from their initializations. Static variables are initialized only once, which does not work for our purposes. E.g. variable i_pin has to be assigned zero value every time running_led() starts its LED cycle anew.
- Declaring automatic variables with initializers between CO_BEGIN and CO_END in C++ code is generally problematic. These macros hide a switch statement and case labels. C++ does not allow jumps over automatic declarations with non-trivial initialization. Trying to declare a local String object will lead to compile error for this reason. And this is also why random_led() function declares its i_pin variable without an initializer and then assigns its value by a separate statement.
Let's see what happens if we run this code on our hardware
No denying, the behavior has become much more "simultaneous": both cycles definitely work together, at the same time. However, it is obvious we are not quite there yet. By inserting that CO_YIELD into the cycles we removed the outer layer of their blocking behavior: they no longer insist on processing all LEDs before returning control. But the second layer of blocking is still there: we still use delay(), and it shows. With this code only one LED can be lit up at a time. We have to get rid of this blocking behavior as well.
An observant reader will immediately point out another glaring problem with coroutine implementation: the function state is saved on per-function basis, not on per-call basis. In this version of the code we have only one spot that calls running_led() and only one spot that calls random_led(), so everything works as expected. However, if we try replacing the call to random_led() with another call to running_led(), things will immediately break down. I'll admit that I've deliberately decided to implement two different behaviors: running_led() and random_led() to sidestep this problem for now. This problem can be solved, but it will involve a more elaborate implementation of coroutines. We'll get to that later.
Step 4: dealing with delay()
In order to replace the regular delay() function with something non-blocking we might just implement the function ourselves by following the same general approach: write a straightforward implementation with a plain busy-cycle that constantly checks millis(), waiting for the requested amount of time to pass
for (unsigned long start = millis(); millis() - start < (duration); );
And in order to make it coroutine-friendly, we just need to periodically invoke CO_YIELD
static unsigned long start; for (start = millis(); millis() - start < (duration); ) CO_YIELD;
Except that now we have to figure out how to wrap it into a function. And the answer is: there's no elegant way to do it within the current simplistic switch/case approach to coroutine implementation. It cannot properly CO_YIELD from multiple levels of nested function calls. We can probably come up with some workaround... but for now let's just sidestep this issue. Just like the problem mentioned at the end of the previous section, this one will be taken care of in a more serious implementation of coroutine support, which I'll present later. As for now, let's simply take an easy way out and implement our delay function as a macro
#define co_delay(duration) do {\ static unsigned long start;\ for (start = millis(); millis() - start < (duration); ) CO_YIELD;\ } while (0)
This will work as we want it to. Moreover, the CO_YIELD invocation introduced by co_delay is perfectly sufficient to ensure the proper multitasking in our code, i.e. there's no need for the CO_YIELD invocation at the end of each iteration anymore. The latter can be left in the code or it can be removed - the code will work well either way. Here's the sketch in its updated form
#define co_delay(duration) do {\ static unsigned long start;\ for (start = millis(); millis() - start < (duration); ) CO_YIELD;\ } while (0) void running_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { CO_BEGIN; static unsigned i_pin; for (i_pin = 0; i_pin < n_pins; ++i_pin) { digitalWrite(pins[i_pin], HIGH); co_delay(duration); digitalWrite(pins[i_pin], LOW); } CO_END; } void random_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { CO_BEGIN; static unsigned n_times; for (n_times = n_pins; n_times > 0; --n_times) { static unsigned i_pin; i_pin = rand() % n_pins; digitalWrite(pins[i_pin], HIGH); co_delay(duration); digitalWrite(pins[i_pin], LOW); } CO_END; } const unsigned PINS1[] = { 6, 7, 8 }; const unsigned N_PINS1 = sizeof PINS1 / sizeof *PINS1; const unsigned PINS2[] = { 2, 3, 4, 5 }; const unsigned N_PINS2 = sizeof PINS2 / sizeof *PINS2; void setup() { for (unsigned i_pin : PINS1) pinMode(i_pin, OUTPUT); for (unsigned i_pin : PINS2) pinMode(i_pin, OUTPUT); } void loop() { running_led(PINS1, N_PINS1, 500); random_led(PINS2, N_PINS2, 800); }
And here it is in action
This is perfection! We finally got rid of all blocking behavior, thus achieving perfect parallelism.
To play with it a bit more, let's increase the workload. Let's connect a multiplexed 4-digit 7-segment LED display to the same Arduino and make our coroutines handle the display in addition to the already existing LEDs. I will use 4-digit display provided on a so called "multi-function shield" for Arduino UNO. It uses just 3 pins as a serial interface that feed shift registers installed on the shield.
And in order to take full advantage of the capabilities provided by coroutines, I will implement it in the most straightforward way possible. Instead of carefully planning interactions between the code that generates the display data, the code that actually displays the data (by constantly sweeping this multiplexed display), and the functions that run our existing LEDs, I will completely ignore these interactions and implement each function in a very independent and decoupled fashion. The rest is suppose to be taken care of automatically, by coroutine infrastructure. We already have our LED functions. Now we have to add two functions for the display.
One function handles the back-end: it maintains a 4-element array of digits. All it does is juggling the values in array in accordance with my intent. It does not know about the indicator itself and does not care about it. It does not communicate with the indicator and it does not concern itself with such hardware-specific matters as multiplexing the display image. This function is entirely focused on the array. In this case the function generates random 2-digit pairs at the beginning of the array and then shifts them towards the end of the array. The shifting speed is controlled by a given delay value.
void generate_digits(unsigned (&digits)[4], unsigned long duration) { CO_BEGIN; for (unsigned &d : digits) d = 10; // blank digit static unsigned i; for (i = 0; ; i = (i + 1) % 3) { memmove(digits + 1, digits, sizeof digits - sizeof *digits); digits[0] = i != 0 ? rand() % 10 : 10; co_delay(duration); } CO_END; }
Note that here I actually decided to implement the function as an "infinite" cycle. This is perfectly natural and appropriate: the function is supposed to run forever after all. Of course, we already know that the cycle is not really infinite at all, since CO_YIELD (used in co_delay) will always be able to exit the cycle (and the function).
And here's the front-end function. This function is entirely focused on the communication with the 7-segment display. It is given an array of digits and it is ordered to keep multiplexing the display as quickly as possible by sending the corresponding patterns to the corresponding positions on the display. In other words, this function handles the hardware: it displays the image, it does not know and does not care how that image is generated
void display_digits(const unsigned (&digits)[4]) { CO_BEGIN; static unsigned i; for (i = 0; ; i = (i + 1) % 4) { write_digit_to_segment(i, digits[i]); CO_YIELD; } CO_END; }
Again, this function is based on an "infinite" cycle as well. And, just for the fun of it, I added such "infinite" cycles to the original LED functions. The final sketch in its entirety looks as follows
#define CO_BEGIN static unsigned state = 0; switch (state) { case 0: #define CO_YIELD do { state = __LINE__; return; case __LINE__:; } while (0) #define CO_END } state = 0 #define co_delay(duration) do {\ static unsigned long start;\ for (start = millis(); millis() - start < (duration); ) CO_YIELD;\ } while (0) ////////////////////////////////////////////////////////////////////////////////////////////////// const unsigned LATCH_DISP = A0; const unsigned CLK_DISP = A1; const unsigned DATA_DISP = A2; const uint8_t SEGMENT_MAP[] = { 0xC0, 0xF9, 0xA4, 0xB0, 0x99, 0x92, 0x82, 0xF8, 0x80, 0x90, 0xFF }; const uint8_t SEGMENT_SELECT[] = { 0xF1, 0xF2, 0xF4, 0xF8 }; void write_digit_to_segment(unsigned i, unsigned d) { digitalWrite(LATCH_DISP, LOW); shiftOut(DATA_DISP, CLK_DISP, MSBFIRST, SEGMENT_MAP[d]); shiftOut(DATA_DISP, CLK_DISP, MSBFIRST, SEGMENT_SELECT[i] ); digitalWrite(LATCH_DISP, HIGH); } ////////////////////////////////////////////////////////////////////////////////////////////////// void running_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { CO_BEGIN; for (;;) { static unsigned i_pin; for (i_pin = 0; i_pin < n_pins; ++i_pin) { digitalWrite(pins[i_pin], HIGH); co_delay(duration); digitalWrite(pins[i_pin], LOW); } } CO_END; } void random_led(const unsigned pins[], unsigned n_pins, unsigned long duration) { CO_BEGIN; for (;;) { static unsigned i_pin; i_pin = rand() % n_pins; digitalWrite(pins[i_pin], HIGH); co_delay(duration); digitalWrite(pins[i_pin], LOW); } CO_END; } void generate_digits(unsigned (&digits)[4], unsigned long duration) { CO_BEGIN; for (unsigned &d : digits) d = 10; // blank digit static unsigned i; for (i = 0; ; i = (i + 1) % 3) { memmove(digits + 1, digits, sizeof digits - sizeof *digits); digits[0] = i != 0 ? rand() % 10 : 10; co_delay(duration); } CO_END; } void display_digits(const unsigned (&digits)[4]) { CO_BEGIN; static unsigned i; for (i = 0; ; i = (i + 1) % 4) { write_digit_to_segment(i, digits[i]); CO_YIELD; } CO_END; } const unsigned PINS1[] = { 6, 7, 8 }; const unsigned N_PINS1 = sizeof PINS1 / sizeof *PINS1; const unsigned PINS2[] = { 2, 3, 4, 5 }; const unsigned N_PINS2 = sizeof PINS2 / sizeof *PINS2; void setup() { for (unsigned i_pin : PINS1) pinMode(i_pin, OUTPUT); for (unsigned i_pin : PINS2) pinMode(i_pin, OUTPUT); pinMode(LATCH_DISP, OUTPUT); pinMode(CLK_DISP, OUTPUT); pinMode(DATA_DISP, OUTPUT); } void loop() { running_led(PINS1, N_PINS1, 500); random_led(PINS2, N_PINS2, 800); static unsigned digits[4]; generate_digits(digits, 300); display_digits(digits); }
Works perfectly
This concludes the first part of the article. The second part is dedicated to implementing a more comprehensive coroutine support, which is supposed to be free of limitations we encountered here.