0x00 Abstract
When we generally develop Arduino programs, we write code based on functional requirements. Once the program meets these requirements after testing, development stops, and we only modify the code when functional requirements change or bugs are found. In real project development, besides ensuring functionality and bug-free code, we also require continuous improvement in program execution efficiency. Specifically, the requirements are as follows:
(1) The program and data should occupy as little device storage space as possible, whether in ROM or RAM;
(2) The execution speed of the program should be as fast as possible, with the premise of ensuring the accuracy and stability of program execution;
(3) Reduce the overall power consumption of the system, that is, consume as little electricity as possible to ensure energy savings;
In simple terms, we hope to execute the program using the least device storage space while consuming the least power, making our products simpler, cheaper, and more reliable. In this tutorial, we will use the low-level code of atmega2560 to achieve the same functionality as before, ensuring that the program occupies less space and runs faster, as commonly used Arduino functions are just wrappers around these low-level functions for convenience.
After code optimization, the program’s space usage can be reduced by about 50%. This tutorial will also help us better understand the underlying implementation mechanism of Arduino programs.
0x01 View Blink Program
The first program developed on Arduino is usually the blink program that controls the LED on pin D13 to flash. The source code is as follows:
// the setup function runs once when you press
// reset or power the board
void setup() {
// initialize digital pin LED_BUILTIN as an output.
pinMode(LED_BUILTIN, OUTPUT);
}
// the loop function runs over and over again forever
void loop() {
// turn the LED on (HIGH is the voltage level)
digitalWrite(LED_BUILTIN, HIGH);
delay(1000);
// turn the LED off by making the voltage LOW
digitalWrite(LED_BUILTIN, LOW);
delay(1000);
}
This source code shows the entire process of blinking. First, in the setup() function, we initialize and configure pin 13 as an output. Then, in the loop() function, we continuously change the output voltage. Setting it to high lights the LED, while setting it low turns it off, with a delay function added to allow us to see the state changes. Without the delay, the program would run at full speed, and we would only see the LED in the on state.
Next, let’s take a closer look at the compilation log of this source code:
It can be seen that the compiled source code occupies 1462 bytes, while the maximum available Flash space for Arduino Mega2560 is 253952 bytes, 253952/1024 = 248 KB. Below is the statistical information for the Arduino Mega2560 development board for better understanding:
The difference between Flash Memory and EEPROM is that Flash Memory is a type of long-life non-volatile memory (it can retain stored data even without power). Data is deleted not byte by byte but in fixed blocks (note: NOR Flash stores bytes). Block sizes typically range from 256KB to 20MB. Flash Memory is a variant of electronically erasable programmable read-only memory (EEPROM), with EEPROM allowing byte-level erase and rewrite rather than entire chip erasure, while most Flash Memory chips require block erasure.
0x02 Optimize pinMode() Function
According to the compilation log, the original blink program’s compiled binary size is 1462 bytes. Writing a program to control the LED blinking consumes such a large storage space, so for more complex functionalities, the program size will likely exceed Flash limits. Therefore, we need to minimize this size to enable writing larger programs and achieving more complex functionalities.
Let’s first check the size occupied by the pinMode() function. By commenting it out and recompiling, we find that the binary file size is only 1384 bytes, which is 80 bytes smaller than the original 1462 bytes.
Let’s take a look at how the pinMode() function is implemented. The implementation source code is in the Arduino IDE software directory:
~/Software/arduino-1.8.4/hardware/arduino/avr/cores/arduino/wiring_digital.c:
void pinMode(uint8_t pin, uint8_t mode)
{
uint8_t bit = digitalPinToBitMask(pin);
uint8_t port = digitalPinToPort(pin);
volatile uint8_t *reg, *out;
if (port == NOT_A_PIN) return;
// JWS: can I let the optimizer do this?
reg = portModeRegister(port);
out = portOutputRegister(port);
if (mode == INPUT) {
uint8_t oldSREG = SREG;
cli();
*reg &= ~bit;
*out &= ~bit;
SREG = oldSREG;
} else if (mode == INPUT_PULLUP) {
uint8_t oldSREG = SREG;
cli();
*reg &= ~bit;
*out |= bit;
SREG = oldSREG;
} else {
uint8_t oldSREG = SREG;
cli();
*reg |= bit;
SREG = oldSREG;
}
}
From this function, we can see that pinMode simply sets a specific bit of a port to 1. Next, we need to clarify which bit D13 is connected to on which port. We can follow the code and eventually find in~/Software/arduino-1.8.4/hardware/arduino/avr/variants/mega/pins_arduino.h that PWM13 is connected to which port:
const uint8_t PROGMEM
digital_pin_to_port_PGM[] = {
//PORTLIST
//—————————–
PE , // PE 0 ** 0 ** USART0_RX
PE , // PE 1 ** 1 ** USART0_TX
PE , // PE 4 ** 2 ** PWM2
PE , // PE 5 ** 3 ** PWM3
PG , // PG 5 ** 4 ** PWM4
PE , // PE 3 ** 5 ** PWM5
PH , // PH 3 ** 6 ** PWM6
PH , // PH 4 ** 7 ** PWM7
PH , // PH 5 ** 8 ** PWM8
PH , // PH 6 ** 9 ** PWM9
PB , // PB 4 ** 10 ** PWM10
PB , // PB 5 ** 11 ** PWM11
PB , // PB 6 ** 12 ** PWM12
PB , // PB 7 ** 13 ** PWM13
PJ , // PJ 1 ** 14 ** USART3_TX
PJ , // PJ 0 ** 15 ** USART3_RX
PH , // PH 1 ** 16 ** USART2_TX
PH , // PH 0 ** 17 ** USART2_RX
PD , // PD 3 ** 18 ** USART1_TX
PD , // PD 2 ** 19 ** USART1_RX
PD , // PD 1 ** 20 ** I2C_SDA
PD , // PD 0 ** 21 ** I2C_SCL
PA , // PA 0 ** 22 ** D22
PA , // PA 1 ** 23 ** D23
PA , // PA 2 ** 24 ** D24
PA , // PA 3 ** 25 ** D25
PA , // PA 4 ** 26 ** D26
PA , // PA 5 ** 27 ** D27
PA , // PA 6 ** 28 ** D28
PA , // PA 7 ** 29 ** D29
PC , // PC 7 ** 30 ** D30
PC , // PC 6 ** 31 ** D31
PC , // PC 5 ** 32 ** D32
PC , // PC 4 ** 33 ** D33
PC , // PC 3 ** 34 ** D34
PC , // PC 2 ** 35 ** D35
PC , // PC 1 ** 36 ** D36
PC , // PC 0 ** 37 ** D37
PD , // PD 7 ** 38 ** D38
PG , // PG 2 ** 39 ** D39
PG , // PG 1 ** 40 ** D40
PG , // PG 0 ** 41 ** D41
PL , // PL 7 ** 42 ** D42
PL , // PL 6 ** 43 ** D43
PL , // PL 5 ** 44 ** D44
PL , // PL 4 ** 45 ** D45
PL , // PL 3 ** 46 ** D46
PL , // PL 2 ** 47 ** D47
PL , // PL 1 ** 48 ** D48
PL , // PL 0 ** 49 ** D49
PB , // PB 3 ** 50 ** SPI_MISO
PB , // PB 2 ** 51 ** SPI_MOSI
PB , // PB 1 ** 52 ** SPI_SCK
PB , // PB 0 ** 53 ** SPI_SS
PF , // PF 0 ** 54 ** A0
PF , // PF 1 ** 55 ** A1
PF , // PF 2 ** 56 ** A2
PF , // PF 3 ** 57 ** A3
PF , // PF 4 ** 58 ** A4
PF , // PF 5 ** 59 ** A5
PF , // PF 6 ** 60 ** A6
PF , // PF 7 ** 61 ** A7
PK , // PK 0 ** 62 ** A8
PK , // PK 1 ** 63 ** A9
PK , // PK 2 ** 64 ** A10
PK , // PK 3 ** 65 ** A11
PK , // PK 4 ** 66 ** A12
PK , // PK 5 ** 67 ** A13
PK , // PK 6 ** 68 ** A14
PK , // PK 7 ** 69 ** A15
};
Following the code may be a bit troublesome. The simplest way is to refer to the Arduino Mega2560 pinMap, which shows that D13 (also called PWM13) is connected to bit 7 of Port B. Below is a complete pinMap diagram:
Setting the direction of an I/O pin on Atmel AVR is quite simple. Each pin belongs to a port, and each bit in an I/O port can be either input or output. The direction of each individual pin is determined by the bit in the associated data direction register (DDRx).
Thus, we can directly set the bit of this register to configure D13 as output mode using a macro definition bitSet(value, bit) implemented in:~/Software/arduino-1.8.4/hardware/arduino/avr/cores/arduino/Arduino.h:
By replacing pinMode with bitSet, we find that the binary program size is reduced by 78 bytes. BitSet only occupies 2 bytes. Imagine if there are 10 pinModes in the program, using bitSet could save 780 bytes instantly. When this program is uploaded to the Arduino Mega2560 board, the effect is the same as pinMode, but the Flash space used is reduced by 78 bytes:
0x03 Optimize Output Pin Code
In the blink program’s loop(), we light up the LED by setting D13 to high, wait for 1 second for visibility, then set D13 to low to turn off the LED, and wait for another second to observe the off state. This cycle creates the blinking effect. We can see that this part of the code is clear and easy to read, but the implementation is somewhat clumsy.
The AVR chip was designed with the need for toggling pins in mind. The input pins address register (PINx) allows us to toggle the output pin status by writing 1 to the corresponding bit. This means that if the current DDRB bit 7 is high, writing 1 to PINB bit 7 will set DDRB bit 7 to low, and writing 1 again will set it back to high. By writing 1, we can continuously toggle the corresponding DDRB bit’s state.
We can optimize digitalWrite(LED_BUILTIN, HIGH/LOW) to bitSet(PINB, 7) to directly manipulate the register, reducing the binary program storage size to 808 bytes, which is a reduction of 576 bytes. This is quite significant, indicating that the implementation of digitalWrite() is very storage-intensive. If there are 10 digitalWrites in the program, using bitSet could save 2.8K of storage space:
0x04 Optimize Delay Function
(1) If strict timing of 1 second is not considered, a simple way to add a delay is to use a for loop. The code is as follows:
We can see that the program’s storage space is now 660 bytes, which saves 148 bytes compared to using the delay() function. However, when this code is uploaded to the Arduino Mega2560 board, the LED does not show the on-off phenomenon, indicating that this delay function is ineffective. Even when the loop count is increased from 30000 to 300000, the delay effect is still not observed. So, what is the reason?
The compiler recognizes that the for loop is empty and determines it as unnecessary code, optimizing it out of the final program during compilation. The compiler’s role is not only to compile your source code but also to optimize the execution speed of your code, removing what it considers redundant.
If you really want to use such an empty loop to increase the delay, you need to use a flag to explicitly inform the compiler not to optimize this empty loop. This flag is the keyword volatile, which tells the compiler not to make any assumptions about this variable and not to optimize it away.
(2) To accurately delay for 1 second, we can use Timer0, which starts running after power-up. The interrupt handler function TIMER0_OVF_vect associated with this timer increments an unsigned long variable timer0_millis. This timer generates an interrupt every 1ms, incrementing timer0_millis by 1, allowing us to track how many milliseconds have passed. If you do not reset this unsigned long variable, it will eventually overflow. Let’s check the value range of basic data types in Arduino:
The maximum value for timer0_millis is 4294967295. If it increments by 1 every 1ms, it will take:
Total seconds = 4294967295/1000 = 4294967.295
Total hours = 4294967.295/3600 = 1193.046470833
Total days = 1193.046470833/24 = 49.710269618
This means that if the Arduino Mega2560 is powered on now and timer0_millis starts incrementing by 1 every 1ms, it will take about 50 days for this counter to overflow from 4294967295 back to 0, then continue counting.
Therefore, if your program uses this timer0_millis value, be aware that your program may encounter errors about every 50 days, such as when using functions like delay() or millis(), as these functions rely on this timer0_millis. millis() returns the total time the program has been running since power-up, which is essentially the value of timer0_millis, but remember this value will cause errors after about 50 days.
Next, let’s modify the delay function in the blink program to use timer0_millis for timing. Since timer0_millis is already defined in~/Software/arduino-1.8.4/hardware/arduino/avr/cores/arduino/wiring.c, we need to declare it as an external variable in our local code, as shown below:
Note that modifying the value of timer0_millis will cause millis() and delay() to function incorrectly, but this does not affect our current experiment. To summarize, the storage space has been reduced from the original 1462 bytes to only 702 bytes after optimization:
Space reduction relative to the original = (1462-702)/1462 = 52%
0x05 Reference
[1]. Arduino Mega2560[OL].
https://www.arduino.cn/thread-17938-1-1.html
[2]. Dale Wheat, translated by Weng Kai. Arduino Technology Insider[M]. Beijing: People’s Posts and Telecommunications Publishing House. 2015. 83-112.
[3]. Flash Memory Concept[OL].
https://baike.baidu.com/item/%E9%97%AA%E5%AD%98/108500?fromtitle=flash%20memory&fromid=3740729&fr=aladdin
[4]. Arduino Mega2560 Pin Mapping[OL].
https://www.arduino.cc/en/Hacking/PinMapping2560
[5]. Basic Data Types in Arduino[OL].
https://www.cnblogs.com/lulipro/p/7672954.html
0x06 Feedback
If you have any questions while following the tutorial, please follow the official WeChat account of ROS Classroom and send me a message to provide feedback. I will handle messages daily! Of course, if you happen to want to tip ROS Classroom, I would greatly appreciate it. A tip of 30 yuan will also invite you into the ROS Classroom WeChat group to learn and communicate with more like-minded partners!
Leave a Comment
Your email address will not be published. Required fields are marked *