Cache

IC:

Introduction

Each CPU has its own cache line length, as the following table lists.

CPU

Type

Size (B)

Way

Cache line length (B)

KM4

I-Cache

16K

4

32

D-Cache

16K

4

32

KM0

I-Cache

16K

4

32

D-Cache

16K

4

32

The Cache of Soc supports Enable/Disable, Flush and Clean operation. Before the main() function, each CPU has already called Cache_Enable() to enable the cache.

Operation

Description

I-Cache

D-Cache

Enable/Disable

Enable or Disable Cache function

Flush (Invalidate)

  • Flush Cache

  • D-Cache can be flushed by address

  • Can be used after DMA Rx, and CPU reads DMA data from DMA buffer for D-Cache

Clean

  • Clean D-Cache

  • D-Cache will be write back to memory

  • D-Cache can be cleaned by address

  • Can be used before DMA Tx, after CPU writes data to DMA buffer for D-Cache

x

Note

  • When CA32 invalidates an address, the cache line will be flushed only if it is in a clean state. If the cache line is in a dirty state, it will perform a clean & invalidate operation.

  • CA32 and DSP have automatic data prefetching capabilities. When the CPU predicts that certain data will be needed in the future, the CPU will perform line fill operations in the background, automatically loading the data into the cache.

  • For Cortex-A32/DSP, DCache_Clean() / DCache_CleanInvalidate() operations write entire cache lines to memory. When two CPUs (with different cache line sizes) communicate using a shared memory region, this shared memory must be aligned with the larger of the two cache line sizes. e.g., if the shared memory is only 32 bytes, CPU0 with a 32-byte cache line will only write 32 bytes each time it cleans, while CPU1 with a 64-byte cache line will write 64 bytes each time it cleans, potentially overwriting other data of CPU0.

Cache APIs

ICache_Enable

void ICache_Enable(void)

Enable I-Cache

ICache_Disable

void ICache_Disable(void)

Disable I-Cache

ICache_Invalidate

void ICache_Invalidate(void)

Invalidate I-Cache

DCache_IsEnabled

u32 DCache_IsEnabled(void)

Check D-Cache enabled or not, return value:

  • 1: Enable

  • 0: Disable

DCache_Enable

void DCache_Enable(void)

Enable D-Cache

DCache_Disable

void DCache_Disable(void)

Disable D-Cache

DCache_Invalidate

void DCache_Invalidate(u32 Address, u32 Bytes)

Invalidate D-Cache by address, parameters:

  • Address: Invalidated address (aligned to cache line size)

  • Bytes: Size of memory block (in number of bytes)

DCache_Clean

void DCache_Clean(u32 Address, u32 Bytes)

Clean D-Cache by address, parameters

  • Address: Clean address (aligned to cache line size)

  • Bytes: Size of memory block (in number of bytes)

DCache_CleanInvalidate

void DCache_CleanInvalidate(u32 Address, u32 Bytes)

Clean and invalidate D-Cache by address, parameters:

  • Address: Clean and invalidated address (aligned to cache line size)

  • Bytes: Size of memory block (in number of bytes)

Note

  • When both Address and Bytes are 0xFFFFFFFF, it indicates cleaning or clearing the entire D-Cache.

  • Address and Bytes must be aligned to the cache line length. If they are not aligned, for example, with a cache line length of 32 bytes, an Address of 0x20000003C, and Bytes of 0x00000008, the operation address range spans two cache lines. The actual operation address range will be 0x200000020 to 0x20000003F and 0x200000040 to 0x20000005F, which may lead to unexpected issues.

How to Define a Non-cacheable Data Buffer

Add SRAM_NOCACHE_DATA_SECTION before the buffer definition to define a data buffer with non-cacheable attribute.

SRAM_NOCACHE_DATA_SECTION u8 noncache_buffer[DATA_BUFFER_SIZE];

Cache Consistency When Using DMA

When DMA is used to migrate data from/to memory buffers, the start address and end address of the buffer must be aligned with the cache line to avoid inconsistencies between cache data and memory data.

For example, if the start address of a buffer is in the middle of a cache line and the first half is occupied by other programs, invalidating or cleaning the current cache line by those programs will affect the entire cache line, resulting in inconsistent cache and memory data of the current buffer.

Note

The DMA operation address must exclusively occupy an entire cache line. The buffer can be defined in one of the following ways:

  • malloc(); this function returns a starting address aligned with the cache line, and the length of the buffer is also aligned with the cache line.

  • ALIGNMTO(CACHE_LINE_SIZE) u8 op_buffer[CACHE_LINE_ALIGMENT(op_buffer_size)]; ALIGNMTO(CACHE_LINE_SIZE) ensures that the starting address is aligned with the cache line, and CACHE_LINE_ALIGMENT(op_buffer_size) ensures that the length is also aligned with the cache line.

DMA Tx Flow

  1. CPU allocates Tx buffer

  2. CPU writes Tx buffer

  3. Realtek recommendation: call DCache_Clean()

  4. DMA Tx configuration

  5. DMA Tx interrupt handling

DMA Rx Flow

  1. CPU allocates Rx buffer

  2. call DCache_Clean() to ensure the Rx buffer is in a clean state; (if the Rx buffer is in a clean state, this step can be skipped)

    Caution

    The reason for performing this step is:

    • For Cortex-A32, if the Rx buffer is in a dirty state in the cache, executing step 5 DCache_Invalidate() will perform both clean and invalidate operation. The clean operation may lead to unexpected write behavior to memory.

    • If the Rx buffer is in a dirty state in the cache, the CPU may write the Rx buffer back to memory from the cache when the CPU’s D-Cache becomes full, which could overwrite the content that DMA has already written.

  1. DMA Rx configuration

  2. DMA Rx interrupt handling

  3. call DCache_Invalidate() to ensure no old Rx buffer data remains in the cache.

Caution

The following step must be performed for the following reasons:

  • For CPUs with automatic data prefetch features, such as Cortex-A32/DSP, e.g., Cortex-A32 reads the contents of adjacent addresses of the Rx buffer, Cortex-A32 starts line fills in the background, to bring the old values of the Rx buffer back into the cache.

  • Prevents the CPU from reading old values into the cache during DMA processing.

  1. CPU reads Rx buffer (the value returned by DMA Rx)