Cache
Introduction
Each CPU has its own cache line length, as the following table lists.
CPU |
Type |
Size (B) |
Way |
Cache line length (B) |
---|---|---|---|---|
KM4 |
I-Cache |
16K |
4 |
32 |
D-Cache |
16K |
4 |
32 |
|
KM0 |
I-Cache |
16K |
4 |
32 |
D-Cache |
16K |
4 |
32 |
CPU |
Type |
Size (B) |
Way |
Cache line length (B) |
---|---|---|---|---|
KM4 |
I-Cache |
16K |
4 |
32 |
D-Cache |
16K |
4 |
32 |
|
KR4 |
I-Cache |
16K |
4 |
32 |
D-Cache |
16K |
4 |
32 |
CPU |
Type |
Size (B) |
Way |
Cache line length (B) |
---|---|---|---|---|
KM4 |
I-Cache |
16K |
4 |
32 |
D-Cache |
16K |
4 |
32 |
|
KR4 |
I-Cache |
16K |
4 |
32 |
D-Cache |
16K |
4 |
32 |
|
DSP |
I-Cache |
32K |
4 |
128 |
D-Cache |
48K |
3 |
128 |
CPU |
Type |
Size (B) |
Way |
Cache line length (B) |
---|---|---|---|---|
KM4 |
I-Cache |
64K |
4 |
32 |
D-Cache |
32K |
4 |
32 |
|
KM0 |
I-Cache |
16K |
2 |
32 |
D-Cache |
8K |
2 |
32 |
|
CA32 |
L1 I-Cache |
32K |
2 |
64 |
L1 D-Cache |
32K |
4 |
64 |
|
L2 Cache |
256K |
8 |
64 |
The Cache of Soc supports Enable/Disable, Flush and Clean operation. Before the main()
function, each CPU has already called Cache_Enable()
to enable the cache.
Operation |
Description |
I-Cache |
D-Cache |
---|---|---|---|
Enable/Disable |
Enable or Disable Cache function |
√ |
√ |
Flush (Invalidate) |
|
√ |
√ |
Clean |
|
x |
√ |
Note
When CA32 invalidates an address, the cache line will be flushed only if it is in a clean state. If the cache line is in a dirty state, it will perform a clean & invalidate operation.
CA32 and DSP have automatic data prefetching capabilities. When the CPU predicts that certain data will be needed in the future, the CPU will perform line fill operations in the background, automatically loading the data into the cache.
For Cortex-A32/DSP,
DCache_Clean()
/DCache_CleanInvalidate()
operations write entire cache lines to memory. When two CPUs (with different cache line sizes) communicate using a shared memory region, this shared memory must be aligned with the larger of the two cache line sizes. e.g., if the shared memory is only 32 bytes, CPU0 with a 32-byte cache line will only write 32 bytes each time it cleans, while CPU1 with a 64-byte cache line will write 64 bytes each time it cleans, potentially overwriting other data of CPU0.
Cache APIs
ICache_Enable
void ICache_Enable(void)
Enable I-Cache
ICache_Disable
void ICache_Disable(void)
Disable I-Cache
ICache_Invalidate
void ICache_Invalidate(void)
Invalidate I-Cache
DCache_IsEnabled
u32 DCache_IsEnabled(void)
Check D-Cache enabled or not, return value:
1: Enable
0: Disable
DCache_Enable
void DCache_Enable(void)
Enable D-Cache
DCache_Disable
void DCache_Disable(void)
Disable D-Cache
DCache_Invalidate
void DCache_Invalidate(u32 Address, u32 Bytes)
Invalidate D-Cache by address, parameters:
Address
: Invalidated address (aligned to cache line size)Bytes
: Size of memory block (in number of bytes)
DCache_Clean
void DCache_Clean(u32 Address, u32 Bytes)
Clean D-Cache by address, parameters
Address
: Clean address (aligned to cache line size)Bytes
: Size of memory block (in number of bytes)
DCache_CleanInvalidate
void DCache_CleanInvalidate(u32 Address, u32 Bytes)
Clean and invalidate D-Cache by address, parameters:
Address
: Clean and invalidated address (aligned to cache line size)Bytes
: Size of memory block (in number of bytes)
Note
When both Address and Bytes are 0xFFFFFFFF, it indicates cleaning or clearing the entire D-Cache.
Address and Bytes must be aligned to the cache line length. If they are not aligned, for example, with a cache line length of 32 bytes, an Address of 0x20000003C, and Bytes of 0x00000008, the operation address range spans two cache lines. The actual operation address range will be 0x200000020 to 0x20000003F and 0x200000040 to 0x20000005F, which may lead to unexpected issues.
How to Define a Non-cacheable Data Buffer
Add SRAM_NOCACHE_DATA_SECTION before the buffer definition to define a data buffer with non-cacheable attribute.
SRAM_NOCACHE_DATA_SECTION u8 noncache_buffer[DATA_BUFFER_SIZE];
Add SRAM_NOCACHE_DATA_SECTION before the buffer definition to define a data buffer with non-cacheable attribute.
SRAM_NOCACHE_DATA_SECTION u8 noncache_buffer[DATA_BUFFER_SIZE];
Note
For KR4: non-cacheable attributes can only be defined through MCCA register , which means a data buffer cannot be defined with non-cacheable attribute.
Add SRAM_NOCACHE_DATA_SECTION before the buffer definition to define a data buffer with non-cacheable attribute.
SRAM_NOCACHE_DATA_SECTION u8 noncache_buffer[DATA_BUFFER_SIZE];
Note
For KR4: non-cacheable attributes can only be defined through MCCA register , which means a data buffer cannot be defined with non-cacheable attribute.
For DSP: to operate the DSP Cache memories, refer to Xtensa LX7 Microprocessor Data Book and Xtensa System Software Reference Manual for more information.
Add SRAM_NOCACHE_DATA_SECTION before the buffer definition to define a data buffer with non-cacheable attribute.
SRAM_NOCACHE_DATA_SECTION u8 noncache_buffer[DATA_BUFFER_SIZE];
Cache Consistency When Using DMA
When DMA is used to migrate data from/to memory buffers, the start address and end address of the buffer must be aligned with the cache line to avoid inconsistencies between cache data and memory data.
For example, if the start address of a buffer is in the middle of a cache line and the first half is occupied by other programs, invalidating or cleaning the current cache line by those programs will affect the entire cache line, resulting in inconsistent cache and memory data of the current buffer.
Note
The DMA operation address must exclusively occupy an entire cache line. The buffer can be defined in one of the following ways:
malloc()
; this function returns a starting address aligned with the cache line, and the length of the buffer is also aligned with the cache line.ALIGNMTO(CACHE_LINE_SIZE) u8 op_buffer[CACHE_LINE_ALIGMENT(op_buffer_size)]
; ALIGNMTO(CACHE_LINE_SIZE) ensures that the starting address is aligned with the cache line, and CACHE_LINE_ALIGMENT(op_buffer_size) ensures that the length is also aligned with the cache line.
DMA Tx Flow
CPU allocates Tx buffer
CPU writes Tx buffer
Realtek recommendation: call
DCache_Clean()
DMA Tx configuration
DMA Tx interrupt handling
DMA Rx Flow
CPU allocates Rx buffer
call
DCache_Clean()
to ensure the Rx buffer is in a clean state; (if the Rx buffer is in a clean state, this step can be skipped)Caution
The reason for performing this step is:
For Cortex-A32, if the Rx buffer is in a dirty state in the cache, executing step 5
DCache_Invalidate()
will perform both clean and invalidate operation. The clean operation may lead to unexpected write behavior to memory.If the Rx buffer is in a dirty state in the cache, the CPU may write the Rx buffer back to memory from the cache when the CPU’s D-Cache becomes full, which could overwrite the content that DMA has already written.
DMA Rx configuration
DMA Rx interrupt handling
call
DCache_Invalidate()
to ensure no old Rx buffer data remains in the cache.
Caution
The following step must be performed for the following reasons:
For CPUs with automatic data prefetch features, such as Cortex-A32/DSP, e.g., Cortex-A32 reads the contents of adjacent addresses of the Rx buffer, Cortex-A32 starts line fills in the background, to bring the old values of the Rx buffer back into the cache.
Prevents the CPU from reading old values into the cache during DMA processing.
CPU reads Rx buffer (the value returned by DMA Rx)