Jan 30

Less talk, more code: minimalist bare metal programming from scratch episode 0

I have rebooted my software development activities with the STM32F4-Discovery around a (for me) new concept: minimalist bare metal programming from scratch. The idea is to go through the development of an unlimited number of self-contained applications of increasing complexity, starting from scratch.
More on the concept can be found at bare.
See also my Git repo’s feed on this page.
I could post walk-throughs if there is some interest. Drop me a line in that case.

Jan 17

Studying while_one linker script on the STM32F4-Discovery

In my previous article, I presented a quite detailed analysis of the binary produced by compiling the “minimal” code sample from GNU Tools for ARM Embedded Processors.
I concluded that in order to have a complete interpretation, one needed to analyze the source code, more precisely the linker script and the start-up code (in assembly). Note that the “payload”-code for the while_one program is, as its name implies, trivial. The linker script and the start-up code are, on the other hand, not trivial, and I will be analyzing the linker script in the rest of this article.
In order to do that, we need to consult [3] ld documentation (part of GNU Binutils documentation).
I might as well start by commenting on the linker script’s name: nokeep.ld. I could not find a clear comment about that in the code, but comparing the file with gcc.ld, which is used for most samples, shows that the LD command KEEP is used far many more times in gcc.ld that in nokeep.ld. We will come back to that command later on.

The linker script starts by including another linker script. This is actually a change that I made, since nokeep.ld and gcc.ld define the same memory regions, which I needed to adapt the the STM32F4-Discovery board. The contents of mem.ld are:

This corresponds to the board’s physical memory map, as specified in [2] STM32F407VG data-sheet.
We will see later on how these memory areas are referred to in the rest of the script.
Next, the linker script includes some introductory comments worth reading:

The “other linker script that defines memory regions FLASH and RAM” is the one included above.
We can see that the rest of the code is supposed to define the symbol Reset_Handler. We will see in the next article that the start-up code (in assembly) does that.
The next row in the linker script is:

According to [3], “The first instruction to execute in a program is called the entry point. You can use the ENTRY linker script command to set the entry point. The argument is a symbol name”. As described in my previous article, Reset_Handler is effectively the start of the first instructions that get executed by the processor.
The rest of the linker script is a single high level block:

According to [3], “The SECTIONS command tells the linker how to map input sections into output sections, and how to place the output sections in memory”. The first output section is:

.text is the name of the output section. As a software engineer, I expect the text section to hold some executable code. We see that it is placed in the FLASH memory region, which seems logical.
Within the curly brackets come some output section commands according to [3].
The first of these is:

If we start by ignoring KEEP we see, according to [3], a fairly typical input section specification that tells the linker to output the .isr_vector sections from all object files (to the .text output section in flash). As we will see in my next article, there is only one such section, defined in the start-up code (in assembly). It is as discussed in my previous article, the vector table.
As for KEEP, according to [3]: “When link-time garbage collection is in use (`–gc-sections’), it is often useful to mark sections that should not be eliminated. This is accomplished by surrounding an input section’s wildcard entry with KEEP(), as in KEEP(*(.init)) or KEEP(SORT_BY_NAME(*)(.ctors))”.
A quick look at makefile.conf and our Makefile will confirm that we do indeed make use of --gc-sections (to reduce code size). The presence of the vector table is required for Cortex-M4 (see [1] Cortex-M4 Devices Generic User Guide), but the linker does not know that. KEEP is how we force the linker to output that input section anyway.
Too be continued (maybe)…

Nov 21

Studying disassembled while_one on the STM32F4-Discovery

In my previous article, I described how I compiled and run/debugged a C forever empty loop on the STM32F4-Discovery with the bare necessities (GNU Emacs, GNU make, OpenOCD, GDB). Since the job is quite simply already done in the GNU Tools for ARM Embedded Processors samples, I just use it. It is quite simply done, but it actually contains quite a lot to learn from.
The purpose of this activity is to study in detail the disassembled code for that program, and the corresponding source code. Obviously, the interesting part of the source code is not the C-code, limited to a main function that contains “for (;;);“, but the start-up code (in assembly in the sample) and the linker script.
The reference document used to interpret the disassembled code is [1] Cortex-M4 Devices Generic User Guide, that I discovered recently and that looks like the perfect reference for software developers, at least as long as one limits oneself to generic Cortex-M4 code. Also, [2] STM32F407VG data-sheet is used for the specific memory map.
As mentioned in a previous article, the disassembled code is the following:

[1] specifies that the vector table is located at address 0x0000 0000. However, [2] specifies that addresses 0x0000 0000-0x000F FFFF are aliased to flash (in our boot pin case) and that flash addresses are 0x0800 0000-0x080F FFFF.
Therefore, the beginning of the assembly code above makes sense.

According to [1], the first value in the vector table is the initial stack pointer (SP) value. In our case, this is 0x2002 0000, which according to the memory map in [2] is the address just above the highest position is regular SRAM. This is consistent with [1], that specifies: “The processor uses a full descending stack. This means the stack pointer holds the address of the last stacked item in memory. When the processor pushes a new item onto the stack, it decrements the stack pointer and then writes the item to the new memory location”. At reset, the stack is empty.

According to [1], this is the reset entry. Also, “reset is invoked on power up or a warm reset. The exception model treats reset as a special form of exception. When reset is asserted, the operation of the processor stops, potentially at any point in an instruction. When reset is deasserted, execution restarts from the address provided by the reset entry in the vector table”. Also according to [1], “The least-significant bit of each vector must be 1, indicating that the exception handler is Thumb code”. In our case, the processor will jump to 0x0800 0048, which is:

We will walk through that code later on. Let’s carry on with the vector table.

According to [1], the entries from 0x0008 to 0x0018 correspond to NMI, hard fault, memory management fault, bus fault, and usage fault, respectively. They all point to 0x0800 0088, which is:

This is a forever empty loop. The form b.n, according to [1], forces a 16-bit instruction (e7fe as we see). I haven’t dived into the binary ISA, but since the same instruction is used for main(), it is obviously a branch to an address relative to the program counter.
Continuing in the vector table, the three dots symbolized an area that according to [1] is “Reserved”.
0x002c is the SVCall entry, also pointing to 0x0800 0088. 0x0030 is according to [1] “reserved for debug”, 0x0034 is just “reserved”. 0x038 to 0x0040 are the PendSV, Systick and IRQ0 entries, respectively, also pointing to 0x0800 0088.

Let’s now have a look at the reset handler:

It is in fact difficult to interpret without studying the source code, more precisely the assembly file startup_ARMCM4.S, provided in GNU Tools for ARM Embedded Processors samples for Cortex-M4. I will do that in my next article.
For now, I will conclude this article saying that the reset handler copies some data from flash to RAM and clears one BSS section (a BSS section is a section of data that is initialized to zero when the program starts). However, the constants located at 0x0800 0078-0x0800 0084, which are the start and end addresses for these sections, are all the same. This implies that the sections have a size of zero words. That is not surprising, since the program does not have any static data.
Lastly, the reset handler executes SystemInit, which returns without doing anything, and branches to main, which is our main empty forever loop.

Nov 14

Back to while_one project on the STM32F4-Discovery

I have left my STM32F4-Discovery in its box for a long time while, among others, working on Nand2Tetris, but I have been missing it. I would now like to rebuild the while_one project from scratch and continue from there, with only the bare necessities:

the two latter simply being unpacked in my home directory, with the purpose of serving as code copy/paste sources, my idea being to include as little generic code as possible in my projects, in order to keep control over it. The tool chain from “GNU Tools” is of course also my tool chain.
I basically run the same procedure as described in Running ARM samples on the STM32F4-Discovery, except that I run GDB in Emacs (M-gdb, command edited to arm-none-eabi-gdb -i=mi). I also change the original ARM Makefile to compile with debugging symbols (see Stm32F4DiscoveryTest).
I can then step through the source code, both the startup assembly code and the C-code in Emacs by using stepi in GDB.
Note: I finally keep the structure provided by the samples in GNU Tools for ARM Embedded Processors because it has a simple Makefile hierarchy, and seems to limit boilerplate code to a minimum. My intention is to build further from minimum.c, which basically is a “while one” program (it is actually a “for (;;);” program).

Jul 01

While_one project on the STM32F4-Discovery with a GNU ARM Eclipse template

In our previous post, we reported how easy it was to produce and run a blinking program on the STM32F4-Discovery with Eclipse IDE for C/C++ Developers, GNU ARM Eclipse, GNU Tools for ARM Embedded Processors and OpenOCD. It did however, leave me an impression that the project and executable were quite large. Let’s check how it really is.
Who says that “Hello world” is a simple program? It certainly isn’t in bare metal programming. Even blinking a LED is too advanced for our purpose which is to study in details the structure of the STM32F4xx C/C++ project template in GNU ARM Eclipse:

  • Source code
  • Makefile
  • Map file
  • In a lesser extent or just for fun, processor instruction level

To reach that purpose, we need no more no less than the while_one program:

In the Eclipse setup described in our previous post, we create a C project called while_one. It will be an STM32F4xx C/C++ Project, with the Cross ARM GCC toolchain.
Under “Target processor settings”, we choose an STM32F407xx with a Flash size of 1024 KB. The content is “empty”, we use no POSIX system calls and no trace output. We check “some” and “most” warnings, and leave the other settings as they are. We leave the standard folders as they are. We select the Debug and the Release configurations. We use the tool chain “GNU Tools for ARM Embedded Processors (arm-none-eabi-gcc)” and set the correct path to the bin folder (where arm_none_eabi_gcc is located).
The generated code is just what we need:

I have however a number of issues with the resulting project:

  • It does not run on target! More precisely, when trying to step over from the first row in main(), the OpenOCD console ends in a "Info : halted: PC: 0x08000cb4" forever loop. This is in contrast to the blinky program, that just runs as expected. Since the main() function is trivial, the problem must be related to the initialization that happens before main() is invoked.
  • The project includes 10 files from the STM32CubeF4 HAL. I have a hard time believing that while_one needs some much hardware support.
  • The project includes some files, for example _initialize_hardware.c, that are part of “the µOS++ III distribution”. Firstly, I find a bit strange to have some files included from a project that I did not intend to use (at least not right now). Secondly, just to take one example, __initialize_hardware() only enables the FPU, which is also done by SystemInit() in system_stm32f4xx.c, provided by STM32CubeF4 as specified in CMSIS. In other words, the template provides code that is redundant with what ST provides, that also is included in the project.

The observations above are pretty much enough for me to avoid using the STM32F4xx C/C++ Project template from GNU ARM Eclipse. The rest of it, GNU ARM C/C++ Cross Compiler Support and GNU ARM OpenOCD Debugging support still seems interesting, however. My next step will probably be to keep these two plugins, to remove GNU ARM C/C++ STM32Fx Project Templates from Eclipse, and to rebuild the while_one project from STM32CubeF4 instead.

Jun 29

Running an STM32CubeF4 template on the STM34F4-Discovery

STM32CubeF4
Led by The Definitive Guide to ARM® Cortex®-M4…, we have quite easily managed to compile and run a sample from GNU Tools for ARM Embedded Processors (see earlier post). However, we only got a generic Cortex-M4 startup assembly file and corresponding linker script from the sample. According to The Definitive Guide to ARM® Cortex®-M4…, there is more we can get from our vendor, ST in this case (HAL headers and code, drivers, and more generally, all sort of boilerplate code we want to have when we make full use of the board’s resources, instead of reinventing the wheel). STM32CubeF4 is just that, and quite a lot more (especially plenty of example applications and templates). It complies to The ARM CMSIS, Cortex Microcontroller Software Interface Standard, a vendor-independent hardware abstraction layer for the Cortex-M processor series that also specifies debugger interfaces.
What I am most interested in is the contents of Projects/STM32F4-Discovery/Templates/, as it should contain exactly what we need to develop applications for the board (although I am not sure whether it includes support for C++ compilation, which I intend use, but the ARM variant used in an earlier post did have such support, so it should be easy enough to copy/paste).
Projects/STM32F4-Discovery/Templates/ contains project files for several development environments, but no Makefile. One of the supported environment is TrueSTUDIO, that seems to make use of a GNU chain, which is good for us.
I might as well take the opportunity to digress a little about the development environment topic. I won’t apologize for loving open source. No single software vendor has a chance to have nearly as many reviewers as an open source tool. Many reviewers just means higher quality, it’s that simple. Using a GNU toolchain is not even a topic of discussion for me. Using openocd, including its integration with GDB has been very positive so far, so I do not see a reason for looking elsewhere. What is left to choose is:

  • The editor.
  • The debugger GUI (living without a debugger GUI is not really an alternative).
  • Last but not least: the build tool.

Concerning point 1, although I have used Emacs many years, I am leaning towards Eclipse because it is the de facto standard. The reason is that I also develop software for a living, and Eclipse is probably preferable for a potential customer. It is easier to get a consensus around it. The debugger GUI issue is then solved as well (with the right plugins). When it comes to the build tool, I want to be able to build both inside and outside of Eclipse. I reckon that will ease the generation of production binaries, and I also reckon that GNU make and its Makefile are the natural solution for that issue.

GNU ARM Eclipse
I have investigated the fastest way to get a blinking LED example running/debugged under Eclipse:

  • Download Eclipse from Eclipse IDE for C/C++ Developers. Unpack it wherever you like and start it from there.
  • Install GNU ARM Eclipse, as documented under GNU ARM Eclipse plugins installation. GNU ARM Eclipse is a set of plugins, quoting the site: “currently maintained by Liviu Ionescu, a senior IT engineer, with expertise in operating systems, compilers, embedded systems and Internet technologies”.

GNU ARM Eclipse is certainly impressive. Once that was installed, using the documentation from the same site, GNU Tools for ARM Embedded Processors, and OpenOCD, I could run/debug a LED blinking example and see printouts from the program in an Eclipse console in no time, without even using STM32CubeF4. It should be however noted that some code in the plugin, I guess most of what is specific to ST MCUs and boards, comes from STM32CubeF4.

Jun 28

Running ARM samples on the STM32F4-Discovery

Now that we have an original flash image that we know how to restore, it is time to start building and running our own software on the board.

When it comes to the toolchain, I started with the version provided by Manjaro, but I ran into an issue related to Newlib-Nano, which is the C library that is supposed to be used with that toolchain. After a few other tries, I was finally successful with the toolchain provided by ARM and located at GNU Tools for ARM Embedded Processors, that the arch/Manjaro packages are built on anyway. As mentioned in a previous post, the installation is not more intrusive than unpacking a compressed folder and pointing to it in my PATH.
Led by The Definitive Guide to ARM® Cortex®-M4…, who recommended the use of linker scripts provided by ARM in their toolchain samples, I decided to start by building and running the actual samples.
To start with, I reuse the exact code structure provided by ARM in their samples. My purpose was to be able to just run make after as few adaptations as possible. The structure is the following:

The dump directory is mine. The rest is a copy/paste of the contents of ARM’s sample folder.
Under ldscripts, I have modified the contents of the mem.ld file to match my board:

Since gcc.ld (used in most samples) and nokeep.ld had the same rows, I replaced the redundancy by some INCLUDE commands:

The default processor in the samples being a Cortex-M0, I also change the processor to a Cortex-M4:
[nilo@floor arm-none-eabi]$ head src/makefile.conf

And then, under the src directory, I just ran make. :-)
Here for the short version:

The simplest of these examples being minimum, that is the one I decided to test.

Under openocd telnet:

The PC and the MSP match the disassembled image:

Now debugging in gdb (openocd still started, telnet closed, gdb connected instead):

Jun 28

Restoring original flash contents to the STM32F4-Discovery

Now we will test restoring the binary image that we earlier got from dumping the original contents of the flash memory.
Our unique flash bank looks as follows:

We can first verify the image file:

We can then naively test to restore the image without first erasing the bank:

This is not surprising, although I have seen it go through without an error message before (I am not sure what really happened in that case).
Lets now try to first erase the whole bank.
We check the contents of the first word:

We recognize the first word from earlier. Now we erase the whole bank (i.e. the whole flash memory):

It does look like the flash memory is erased. Now lets restore the original image:

This worked too! After the reset, the LEDs are flashing as they did before, instead of staying unlit when I run reset just after the erasing.

Jun 27

Disassembling original flash contents from the STM32F4-Discovery

To work with bare metal ARM programming, I need a bare metal ARM toolchain. Being a Manjaro Linux user, I first installed the following packages from the regular repositories:
– arm-none-eabi-binutils
– arm-none-eabi-gcc
– arm-none-eabi-gdb
This works well enough for what I am doing in this post. However, when trying to compile some samples from ARM it complains as follows:

Newlib-Nano was produced as part of ARM’s “GNU Tools for ARM Embedded Processors” initiative in order to provide a version of Newlib focused on code size. The error is apparently a known issue in arch/Manjaro. The easiest solution I found was to uninstall the packages above, unpack the pre-built toolchain provided by ARM at GNU Tools for ARM Embedded Processors to my home folder and to adapt my PATH to that location, as mentioned in readme.txt.
Now, we disassemble the binary we previously got in openocd:

The -Mforce-thumb option is required because this version of objdump, although recent (binutils 2.24) does not have an explicit armv7 option or equivalent. Cortex-M4 processors implement the ARMv7-M architecture that uses the Thumb-2 instruction set architecture, i.e. a seamless mix of 16 and 32-bit instructions. Without the -Mforce-thumb option, objdump interprets the binary as 32-bit instructions only, which is totally incorrect. In fact, most of the instructions in that binary happen to be 16-bit wide.
As a matter of fact, openocd can disassemble too:

That is a straight disassembly of the first ten instructions located at address 0x00000000 which, as mentioned in an earlier post, is mapped to the start of the internal flash. It seems that opendocd does not need to be instructed about the detailed architecture, probably because that information already is contained in the configuration files used when starting the program.
So, the processor starts by executing lsrs r0, r0, #0x12, right? Wrong. As explained in The Definitive Guide to ARM® Cortex®-M3 and Cortex®-M4 Processors, Third Edition, the first thing the processor does when it comes out of reset, is fetching the MSP value (Main Stack Pointer) from address 0x0000 0000, i.e. a 32-bit address, in our case 0x2000 0c80, which unsurprisingly lies in SRAM (0x2000 0000 - 2001 FFFF) according to the STM32F407VG datasheet. The stack grows downwards, so that address is the top of the stack.
Next, the processor fetches the reset vector from address 0x0000 0004. In our case 0800 422d, which is in flash (0x0800 0000 - 0x080F FFFF according to the same datasheet).
The processor then starts to execute the program from the reset vector address and begins normal operations:

The reason why the fetched vector address ends with 422d instead of 422c is because vector addresses in the vector table should have their LSB set to 1 to indicate that they are Thumb code.
The first instruction loads the value located at address 0x0800 4240, that is 0xe000 ed88 to r0 (the disassembler interprets it as a 32-bit unknown instruction, assuming that the first word is most significant, which explains the half word inversion in presentation). The ARMv7-M ARM (Architecture Reference Manual) tells us that 0xe000 ed88 is the address of the Coprocessor Access Control Register (CPACR). The three following instructions set the so-called CP10 and CP11 bit fields to 0b11, which give full access to the floating point coprocessor.