Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Porting Drivers to the STM32 F7

Problem Statement

I recently completed a port to the STMicro STM32F746G Discovery board. That MCU is clearly a derivative of the STM32 F3/F4 and many peripherals are, in fact, essentially identical to the STM32F429. The biggest difference is that the STM32F746 sports a Cortex-M7 which includes several improvements over the Cortex-M4 and including, most relevant to this discussion, a fully integrated data cache (D-Cache).

Because of this one difference, I chose to provide the STM32 F7 code its own directories separate from the STM32 F1, F2, F3, and F4.

Porting Simple Drivers

Some of the STM32 F4 drivers can be used with the STM32 F7 can be ported very simply; many ports would just be a matter of copying files and some search-and-replacement. Like:

  • Compare the two register definitions files; make sure that the STM32 F4 peripheral is identical (or nearly identical) to the F7 peripheral. If so then,
  • Copy the register definition file from the stm32/hardware to the stm32f7/hardware directory, making name changes as appropriate and updating any minor register differences.
  • Copy the corresponding C file (and possibly a .h file) from the stm32/ directory to the stm32f7/ directory, again making any naming changes and modifications for any register differences.
  • Update the Make.defs file to include the new C file in the build.

Porting Complex Drivers

The Cortex-M7 D-Cache, however, does raise issues with the compatibility of most complex STM32 F4 and F7 drivers. Even though the peripheral registers may be essentially the same between the STM32F429 and the the STM32F746, many drivers for the STM32F429 will not be directly compatible with the STM32F746, particularly drivers that use DMA. And that includes most complex STM32 drivers!

Cache Coherency

With DMA, physical RAM memory contents is accessed directly by peripheral hardware without intervention from the CPU. The CPU itself deals only the indirectly with RAM through the D-Cache: When you read data from RAM, it is first loaded in the D-Cache then accessed by the CPU. If the RAM contents is already in the D-Cache, then physical RAM is not accessed at all! Similarly, when you write data into RAM (with write buffering enabled), it may actually not be written to physical RAM but may just remain in the D-Cache in a dirty cache line until that cache line is flushed to memory. Thus, there may be inconsistencies in the contents of the D-Cache and in the contents of contents of physical RAM due related to DMA. Such issues are referred to as Cache Coherency problems.

DMA

DMA Read Accesses

A DMA read access occurs when we program DMA hardware to read data from a peripheral and store that data into RAM. This happens, for example, when we read a packet from the network, when we read a serial byte of data from a UART, when we read a block from an MMC/SD card, and so on.

...

What if the D-Cache line is also dirty? What if we have writes to the DMA buffer that were never flushed to physical RAM? Those writes will then never make it to physical memory if the D-Cache is invalideated. Rule 2: Never write to read DMA buffer memory! Rule 3: Make sure that all DMA read buffers are aligned to the D-Cache line size so that there are no spill-over cache effects at the boarders of the invalidated cache line.

DMA Write Accesses

A DMA write access occurs when we program DMA hardware to write data from RAM into a peripheral. This happen for example, when we send a packet on a network or when we write a block of data to an MMC/SD card. In this, the hardware expects the correct data to be in physical RAM when write DMA is performed. If not then, the wrong data will be sent.

...

What if you had two adjacent DMA buffers side-by-side? Couldn't the cleaning of the write buffer force writing into the adjacent read buffer? Yes! Rule 5: Make sure that all DMA write buffers are aligned to the D-Cache line size so that there are no spill-over cache effects at the borders of the cleaned cache line.

Write-back vs. Write-through D-Cache

The Cortex-M7 supports both write-back and write-through data cache configurations. The write-back D-Cache works just as described above: dirty cache lines are not written to physical memory until the cache line is flushed. But write-through D-Cache works just as without the D-Cache. Writes always go directly to physical RAM.

...

And what if the driver receives arbitrarily aligned buffers from the application? Then what? Should write buffering be disabled in that case too? And what is the performance cost for disabling the write buffer?

DMA Module

Some STM32 F7 peripherals have built in DMA. The STM32 F7 Ethernet driver discussed below is a good example of such a peripheral with built in DMA capability. Most STM32 F7 peripherals, however, have no built-in DMA capability and, instead, must use a common STM32 F7 DMA module to perform DMA data transfers. The interfaces to that common DMA module are described in arch/arm/src/stm32f7/stm32_dma.h.

...

  • TX DMA Transfers. Before calling stm32_dmastart() to start an TX transfer, the DMA client must clean the DMA buffer so that the content to be DMA'ed is present in physical memory.
  • RX DMA transfers. At the completion of all DMAs, the DMA client will receive a callback providing the final status of the DMA transfer. For the case of RX DMA completion callbacks, logic in the callback handler should invalidate the RX buffer before any attempt is made to access new RX buffer content.

Converting an STM32F429 Driver for the STM32F746

Since the STM32 F7 is so similar to the STM32 F4, we have a wealth of working drivers to port from. Only a little effort is required. Below is a summary of the kinds of things that you would have to do to convert an STM32F429 driver to the STM32F746.

An Example

There is a good example in the STM32 Ethernet driver. The STM32 F7 Ethernet driver (arch/arm/src/stm32f7/stm32_ethernet.c) derives directly from the STM32 F4 Ethernet driver (arch/arm/src/stm32/stm32_eth.c). These two Ethernet MAC peripepherals are nearly identical. Only changes that are a direct consequence of the STM32 F7 D-Cache were required to make the driver work on the STM32 F7. Those changes are summarized below.

Reorganize DMA Data Structure

The STM32 Ethernet driver has four different kinds DMA buffers:

...

This does, of course, force additional changes to the functions that initialize the buffer chains, but I will leave that to the interested reader to discover.

Add Cache Operations

The Cortex-M7 cache operations are available the following file is included:

...