This page was written on the assumption that you’ve read the Shell Over USB page and focuses only on the required additional steps when targeting an STM32H7 device.
The STM32H7 family is, as of this writing, the most powerful family of microcontrollers in the ST catalog. In fact, with some devices running at 550 MHz, they are almost the fastest in the world.
Naturally, this comes at a price : higher complexity.
For you, personally ? That means every feature can get a bit more difficult to use.
1. Nefastor’s Lecture Time
Note : the STM32F7 family has a very similar architecture to the STM32H7. The only reason it is slower is because it’s made on a 90 nm process, whereas STM32H7 are 40 nm chips. Everything on this page applies to both families.
Let me try to give you a brief explanation, because this is important : the STM32H7 chips achieve very high performance by using the ARM Cortex-M7 core instead of the older and simpler M0, M3 and M4 you may be used to. You can draw a parallel to PC processors in how that works : as clock frequencies increased, processors have had to introduce all sorts of hardware tricks to make sure that higher clock frequency meant higher processing power (and not just higher power consumption). The M7 introduces cache memories. Here are the M4 and M7 side by side, courtesy of the ARM website :
Top to bottom, they start out identical although that’s not exactly true (the ARM core on the M7 has a deeper instruction pipeline than on the M4) but things get interesting at the bottom, where you find new and interesting blocks :
- I-cache and D-cache are instruction and data caches that are part of the core itself. On a PC processor that would be your “level 1” cache.
- I-TCM and D-TCM are instruction and data “Tightly-Coupled Memory”. It’s not exactly “level 2” cache but you can think of it that way. These have dedicated buses straight to the ARM core, thus eliminating the bottleneck of the AMBA bus matrices that connect everything else on the chip (including your usual SRAM blocks)
In the STM32H743 reference manual, the first figure, that shows you the architecture of the microcontroller, is rather explicit. Here’s the Cortex-M7 core :
You need to really be aware of this architecture and its implications if you intend to use an STM32H7.
On smaller, simpler microcontrollers such as the STM32F303, the architecture looks like this :
An AMBA bus matrix sits between your ARM core and every memory block (Flash or SRAM). There’s no cache, which means that your variables only exist in one place.
Now here’s what’s going to ruin your day, this is the important part : you can see that the DMA controllers are on the same side as the core. In this simple chip, when you ask the DMA to transfer a variable to, say, the UART that connects your terminal to your shell, then that DMA sends the exact variable that your code (and your core) uses. But what happens when you have cache memory inside the core, where the DMA controllers can’t access ?
That conundrum is called cache coherence. It is an area of computer science that has been (and still is) the focus of much research. It’s what happens when the same data can exist in multiple memories and can be accessed by different cores.
STM Shell relies on DMA transfer of text strings produced by code. However, by default, there is no guarantee that a string you just built will be stored in the SRAM the DMA controller has access it. It’s almost certain that it will be stored in data cache first. Your DMA will then send something else, and in most cases that will be garbage.
Obviously, this problem isn’t limited to STM Shell. You will face it every time you need to use DMA on such a complex microcontroller. While this page focuses on STM Shell, you should see it as a more general introduction to the intricacies of high-performance microcontrollers and how to extract that performance without also (forcefully) extracting your hairs off your head.
Let’s start with the most obvious workaround :
2. Cache Or No Cache
That is the question. Yes, I’m paraphrasing famous computer scientist William Shakespeare.
One thing to know about cache memory is that you can disable it. Some industries and some application forbid the use of cache because it introduces non-deterministic execution timing. In fact cache is normally disabled by default. In that state, you can basically use STM Shell exactly as if you were targeting a lowly STM32F103. So this begs the question : why even enable cache in the first place ?
You can hit 550 MHz on an STM32H730 without enabling cache, no problem.
To put it simply, using cache memory will make your code faster for a given clock speed. This can be demonstrated rather easily with a simple LED blink program, and I encourage you to try it if you don’t believe it. Here’s how you can find out in five minutes, assuming you have any NUCLEO-H7 :
- Create a new project for that board in STM32CubeIDE
- Let the IDE initialize the project to match the board
- Don’t change anything, just check the Cortex-M7 is clocked at 64 MHz (should be the default setting)
- Generate the code, and then add this to the main function :
HAL_GPIO_TogglePin (LD1_GPIO_Port, LD1_Pin); unsigned long i; for (i = 0; i < 5000000; i++);
Build and run. You will see the green LED blink rather slowly. Now for the fun trick :
- Return to the microcontroller’s configuration. Under “Sytem Core”, “CORTEX_M7”, enable both ICACHE and DCACHE
- Generate the code again
- Don’t change anything else, just build and run. The LED will be blinking three times faster as without cache.
So yeah, Captain Obvious was right again : ST didn’t shove cache memory into their chips for no reason.
The question now becomes : does your application need cache ? The thing is, there are many reasons to use an STM32H7 that don’t involve processing speed. Perhaps your application would run happily on an STM32F103, but you need an Ethernet port (foreshadowing alert) or a camera interface, or USB High Speed, not just Full Speed. That doesn’t mean you will be running at maximum speed, and this, in turn, means you might be fine with replacing cache use with a higher clock frequency if it simplifies your software design. You may also have no choice, if regulations forbid the use of cache memory (as may be the case in safety-critical aeronautic systems for example).
But here’s the catch : that Ethernet I just mentioned ? It won’t work unless you enable data cache. Try enabling the LwIP stack in our little “LED blink” project. If cache is disabled, you’ll see this :
My advice ? Keeping the cache disabled might make your life easier but it’s leaving money on the table. Just as a true hunter uses every part of his prey, a true programmer uses every part of his chips.
3. STM Shell and Cache
Because of its reliance on DMA transfers from memory to peripheral, STM Shell is susceptible to the activation of the data cache. The STM32H7 is still a microcontroller, it is not capable of detecting cache coherence faults on its own.
There are workarounds such as flushing cache to SRAM explicitly, but they aren’t fun and will have an impact on the performance of your application.
If you want to dig into this further, the best information can be found in ST application note AN4839, which you can download here.
One thing to know that isn’t mentioned in that note is that neither the HAL nor the LL (low-level libraries) provided by ST cover the control of cache memory. You will need to use CMSIS. A good place to look around is your project’s “Drivers/CMSIS/Include/core_cm7.h” header file, and every function in it that starts with SCB.
In practice, if you don’t want to mess with cache control at your application’s level, you’re left with two options :
- If you’re not using data cache, you can use STM Shell either with a UART or with USB.
- If you are using data cache, the simplest option is USB because it has its own internal buffer systems that doesn’t rely on DMA.
Getting DMA transfers of variables to work with data cache enabled is a full-time job worthy of its own web page, which I’ll write someday if I feel masochistic.
Therefore, the rest of this page will deal with running STM Shell over USB on STM32H7. Furthermore, I’m going to assume you’re using a NUCLEO-H743ZI2 board : adapt the instructions as necessary. Also, small WARNING : the Micro-USB AB socket on the NUCLEO is “upside-down” : if your cable doesn’t want to go in, make sure you’re inserting it correctly.
4. Project Creation
First and foremost, if your board isn’t new, make sure that its solder bridges are configured so that the microcontroller’s USB FS data lines (D+ and D-) are connected to the Micro-USB AB socket near the user button (the one with the blue cap). That is the default configuration, but it can be changed to route those two pins to one of the GPIO connectors instead. You’ll find the relevant information in the board’s user manual and its schematic. My instructions apply to the default configuration the board ships in from ST.
If you’re starting a project from scratch, use the “Board Selector” tab of the “Target Selection” dialog box when creating your new project. Also, don’t forget to click “Yes” when ST offers to initialize all peripherals for you.
Still, we’re going to do a little customization to that default configuration for STM Shell’s purposes.
First, expand “Connectivity” and look for the USB controllers. You shall find two :
The STM32H743 has two separate USB controllers. This is mostly to give you flexibility when designing a board : High-Speed USB 2.0, the one that can reach 480 Mb/s, always requires a dedicated external transceiver, much the same way Ethernet always requires an external PHY (physical layer) chip. If you can’t afford that or if you don’t need more than 12 Mb/s, you can use the Full Speed controller instead, as this one doesn’t require a transceiver.
NUCLEO are inexpensive tools. They do not carry a USB transceiver.
The USB FS controller of an STM32H7 is still more capable than that of an STM32F1 : it’s capable of acting as a host controller, a device controller, or both (which is what “OTG” means). To use it as the interface for a shell, you need to set it up for device-only operation by disabling some of the default options until you match this :
Next stop is the “Middleware and Software Packs” section. Find the “USB_DEVICE” entry and select “Communication Device Class (Virtual Port Com)”.
Finally, in the “Clock Configuration” tab, make sure that the USB controller receives a 48 MHz clock. The STM32H7 makes it easy for you : if you need all the PLL’s to produce different frequencies, there’s a dedicated internal oscillator called RC48 that is designed specifically to meet the clock precision requirements of USB :
And with that, you’ve covered all STM Shell requirements. Generate your project’s code and let’s move on.
5. Software Integration
On that front, there’s nothing special. You can apply the instructions from the previous page, namely :
- Clone my USB VCP and STM Shell libraries into your project
- Eat your Git vegetables like a pro
- Code the portability layer functions
6. Final Words
STM Shell will work with little effort on an STM32H7 if you either :
- Disable data caching in the Cortex-M7 core
- Use USB instead of a UART if you can’t disable data caching
However, the STM32H7 is a complex chip and STM32CubeIDE sometimes doesn’t flag configuration errors that can cause premature hair loss. There are various reasons why you might not get a shell prompt or even a virtual COM port when you finally connect your NUCLEO to your computer.
The most common issue I’ve faced is the microcontroller itself not working at all. This is a clocking issue that happens when configuring the microcontroller to run at high frequency, and comes from STM32CubeIDE being unable or unwilling to point out mistakes in your configurations. Mistakes you made through no fault of your own, but because the tools don’t provide any hit as to what you’re supposed to do to hit your chip’s top speed.
To learn more about the proper configuration and use of an STM32H7, I recommend you visit the STM32H7 section of this website.