In Arduino-Pico (https://github.com/earlephilhower/arduino-pico) we support PSRAM and flash file systems. A user found a data curruption error (https://github.com/earlephilhower/ardui ... ssues/2537) when doing flash writes immediately after PSRAM writes. It turned out to be (a) the ROM overwriting the QMI configuration https://github.com/raspberrypi/pico-sdk/issues/1983 and (b) the cache invalidation that the ROM does on a flash op/write/erase.
[To be clear, PSRAM seems to be rock solid in all other cases, just the flushing is an issue, so I don't believe it to be a bus timing thing...]
It was pretty simple to fix (A) by wrapping the SDK flash calls to reset the QMI interface, but solving (B) by adding in a cache clean (not invalidate) to push out all PSRAM writes has been confusing and fails in unique and interesting ways.
The code shown in another post here with the same issue (viewtopic.php?t=375845), implements the same logic I did, but on very heavily dirty caches it seems to cause everything from hardware faults to a complete chip reset in the pathological case where almost all lines are dirty.
The following app is a plain SDK version of the code I'm trying to use in the core. If you set "cnt = 100" (i.e. max 50 lines dirty) it will pass and run to completion. If you try and fill up the cache with the included 8MB worth of writes (so the cache probably has the last 16KB of writes still in it when the flush happens) the machine will reboot when run natively, and under GDB it dumps ~1/2 way through the flush with a machine error The code first writes out all 8MB of PSRAM, and then reads it back to ensure the cache is flushed by read evictions. Then it re-fills and flushes and (tries to) validate so that there will be some amount of data in the cache to actually flush.
Is there something wrong with the "flush()" routine? It seems to implement exactly what the datasheet says to do. Is there some kind of back-pressure (i.e. we should check bit X somewhere so as not to overflow the cache eviction HW queues)? And how would we know that all the dirty lines are actually cleaned in this case?
As a workaround, to clean the cache, I have successfully read in ~32K of PSRAM values that I *know* are not in the cache, but this is not a real workaround because the app in general doesn't know what's in the cache to ensure it doesn't get a hit and no-eviction (and it's really slow and wasteful).
(Sorry for the length of the code, but I have pasted the entire PSRAM setup routine from MicroPython for completeness.)
main.cppCMakelists.txt::
[To be clear, PSRAM seems to be rock solid in all other cases, just the flushing is an issue, so I don't believe it to be a bus timing thing...]
It was pretty simple to fix (A) by wrapping the SDK flash calls to reset the QMI interface, but solving (B) by adding in a cache clean (not invalidate) to push out all PSRAM writes has been confusing and fails in unique and interesting ways.
The code shown in another post here with the same issue (viewtopic.php?t=375845), implements the same logic I did, but on very heavily dirty caches it seems to cause everything from hardware faults to a complete chip reset in the pathological case where almost all lines are dirty.
Code:
void __no_inline_not_in_flash_func(flush)() { for (volatile uint8_t* cache = (volatile uint8_t*)0x18000001; cache < (volatile uint8_t*)(0x18000001 + 2048 * 8); cache += 8) { *cache = 0; }}
Code:
[rp2350.dap.core0] clearing lockup after double faultThread 2 "rp2350.dap.core0" received signal SIGINT, Interrupt.<signal handler called>
Is there something wrong with the "flush()" routine? It seems to implement exactly what the datasheet says to do. Is there some kind of back-pressure (i.e. we should check bit X somewhere so as not to overflow the cache eviction HW queues)? And how would we know that all the dirty lines are actually cleaned in this case?
As a workaround, to clean the cache, I have successfully read in ~32K of PSRAM values that I *know* are not in the cache, but this is not a real workaround because the app in general doesn't know what's in the cache to ensure it doesn't get a hit and no-eviction (and it's really slow and wasteful).
(Sorry for the length of the code, but I have pasted the entire PSRAM setup routine from MicroPython for completeness.)
main.cpp
Code:
#include <hardware/address_mapped.h>#include <hardware/clocks.h>#include <hardware/gpio.h>#include <hardware/regs/addressmap.h>#include <hardware/structs/qmi.h>#include <hardware/structs/xip_ctrl.h>#include <pico/stdlib.h>#include <hardware/sync.h>#include <stdio.h>#define RP2350_PSRAM_MAX_SELECT_FS64 (125'000'000)#define RP2350_PSRAM_MIN_DESELECT_FS (50'000'000)#define RP2350_PSRAM_MAX_SCK_HZ (109'000'000)#define RP2350_PSRAM_ID (0x5D)#define RP2350_PSRAM_CS (47)// DETAILS///// SparkFun RP2350 boards use the following PSRAM IC://// apmemory APS6404L-3SQR-ZR// https://www.mouser.com/ProductDetail/AP-Memory/APS6404L-3SQR-ZR?qs=IS%252B4QmGtzzpDOdsCIglviw%3D%3D//// The origin of this logic is from the Circuit Python code that was downloaded from:// https://github.com/raspberrypi/pico-sdk-rp2350/issues/12#issuecomment-2055274428//// Details on the PSRAM IC that are used during setup/configuration of PSRAM on SparkFun RP2350 boards.// For PSRAM timing calculations - to use int math, we work in femto seconds (fs) (1e-15),// NOTE: This idea is from micro python work on psram..#define SFE_SEC_TO_FS 1000000000000000ll// max select pulse width = 8us => 8e6 ns => 8000 ns => 8000 * 1e6 fs => 8000e6 fs// Additionally, the MAX select is in units of 64 clock cycles - will use a constant that// takes this into account - so 8000e6 fs / 64 = 125e6 fsconst uint32_t SFE_PSRAM_MAX_SELECT_FS64 = RP2350_PSRAM_MAX_SELECT_FS64;// min deselect pulse width = 50ns => 50 * 1e6 fs => 50e7 fsconst uint32_t SFE_PSRAM_MIN_DESELECT_FS = RP2350_PSRAM_MIN_DESELECT_FS;// from psram datasheet - max Freq with VDDat 3.3v - SparkFun RP2350 boards run at 3.3v.// If VDD = 3.0 Max Freq is 133 Mhzconst uint32_t SFE_PSRAM_MAX_SCK_HZ = RP2350_PSRAM_MAX_SCK_HZ;// PSRAM SPI command codesconst uint8_t PSRAM_CMD_QUAD_END = 0xF5;const uint8_t PSRAM_CMD_QUAD_ENABLE = 0x35;const uint8_t PSRAM_CMD_READ_ID = 0x9F;const uint8_t PSRAM_CMD_RSTEN = 0x66;const uint8_t PSRAM_CMD_RST = 0x99;const uint8_t PSRAM_CMD_QUAD_READ = 0xEB;const uint8_t PSRAM_CMD_QUAD_WRITE = 0x38;const uint8_t PSRAM_CMD_NOOP = 0xFF;const uint8_t PSRAM_ID = RP2350_PSRAM_ID;//-----------------------------------------------------------------------------/// @brief Communicate directly with the PSRAM IC - validate it is present and return the size////// @return size_t The size of the PSRAM////// @note This function expects the CS pin setstatic size_t __no_inline_not_in_flash_func(get_psram_size)(void) { size_t psram_size = 0; uint32_t intr_stash = save_and_disable_interrupts(); // Try and read the PSRAM ID via direct_csr. qmi_hw->direct_csr = 30 << QMI_DIRECT_CSR_CLKDIV_LSB | QMI_DIRECT_CSR_EN_BITS; // Need to poll for the cooldown on the last XIP transfer to expire // (via direct-mode BUSY flag) before it is safe to perform the first // direct-mode operation while ((qmi_hw->direct_csr & QMI_DIRECT_CSR_BUSY_BITS) != 0) { } // Exit out of QMI in case we've inited already qmi_hw->direct_csr |= QMI_DIRECT_CSR_ASSERT_CS1N_BITS; // Transmit the command to exit QPI quad mode - read ID as standard SPI qmi_hw->direct_tx = QMI_DIRECT_TX_OE_BITS | QMI_DIRECT_TX_IWIDTH_VALUE_Q << QMI_DIRECT_TX_IWIDTH_LSB | PSRAM_CMD_QUAD_END; while ((qmi_hw->direct_csr & QMI_DIRECT_CSR_BUSY_BITS) != 0) { } (void)qmi_hw->direct_rx; qmi_hw->direct_csr &= ~(QMI_DIRECT_CSR_ASSERT_CS1N_BITS); // Read the id qmi_hw->direct_csr |= QMI_DIRECT_CSR_ASSERT_CS1N_BITS; uint8_t kgd = 0; uint8_t eid = 0; for (size_t i = 0; i < 7; i++) { qmi_hw->direct_tx = (i == 0 ? PSRAM_CMD_READ_ID : PSRAM_CMD_NOOP); while ((qmi_hw->direct_csr & QMI_DIRECT_CSR_TXEMPTY_BITS) == 0) { } while ((qmi_hw->direct_csr & QMI_DIRECT_CSR_BUSY_BITS) != 0) { } if (i == 5) { kgd = qmi_hw->direct_rx; } else if (i == 6) { eid = qmi_hw->direct_rx; } else { (void)qmi_hw->direct_rx; // just read and discard } } // Disable direct csr. qmi_hw->direct_csr &= ~(QMI_DIRECT_CSR_ASSERT_CS1N_BITS | QMI_DIRECT_CSR_EN_BITS); // is this the PSRAM we're looking for obi-wan? if (kgd == PSRAM_ID) { // PSRAM size psram_size = 1024 * 1024; // 1 MiB uint8_t size_id = eid >> 5; if (eid == 0x26 || size_id == 2) { psram_size *= 8; } else if (size_id == 0) { psram_size *= 2; } else if (size_id == 1) { psram_size *= 4; } } restore_interrupts(intr_stash); return psram_size;}//-----------------------------------------------------------------------------/// @brief Update the PSRAM timing configuration based on system clock////// @note This function expects interrupts to be enabled on entrystatic void __no_inline_not_in_flash_func(set_psram_timing)(void) { // Get secs / cycle for the system clock - get before disabling interrupts. uint32_t sysHz = (uint32_t)clock_get_hz(clk_sys); // Calculate the clock divider - goal to get clock used for PSRAM <= what // the PSRAM IC can handle - which is defined in SFE_PSRAM_MAX_SCK_HZ volatile uint8_t clockDivider = (sysHz + SFE_PSRAM_MAX_SCK_HZ - 1) / SFE_PSRAM_MAX_SCK_HZ; uint32_t intr_stash = save_and_disable_interrupts(); // Get the clock femto seconds per cycle. uint32_t fsPerCycle = SFE_SEC_TO_FS / sysHz; // the maxSelect value is defined in units of 64 clock cycles // So maxFS / (64 * fsPerCycle) = maxSelect = SFE_PSRAM_MAX_SELECT_FS64/fsPerCycle volatile uint8_t maxSelect = SFE_PSRAM_MAX_SELECT_FS64 / fsPerCycle; // minDeselect time - in system clock cycle // Must be higher than 50ns (min deselect time for PSRAM) so add a fsPerCycle - 1 to round up // So minFS/fsPerCycle = minDeselect = SFE_PSRAM_MIN_DESELECT_FS/fsPerCycle volatile uint8_t minDeselect = (SFE_PSRAM_MIN_DESELECT_FS + fsPerCycle - 1) / fsPerCycle; // printf("Max Select: %d, Min Deselect: %d, clock divider: %d\n", maxSelect, minDeselect, clockDivider); qmi_hw->m[1].timing = QMI_M1_TIMING_PAGEBREAK_VALUE_1024 << QMI_M1_TIMING_PAGEBREAK_LSB | // Break between pages. 3 << QMI_M1_TIMING_SELECT_HOLD_LSB | // Delay releasing CS for 3 extra system cycles. 1 << QMI_M1_TIMING_COOLDOWN_LSB | 1 << QMI_M1_TIMING_RXDELAY_LSB | maxSelect << QMI_M1_TIMING_MAX_SELECT_LSB | minDeselect << QMI_M1_TIMING_MIN_DESELECT_LSB | clockDivider << QMI_M1_TIMING_CLKDIV_LSB; restore_interrupts(intr_stash);}size_t psram_size;//-----------------------------------------------------------------------------/// @brief The setup_psram function - note that this is not in flash//////static void __no_inline_not_in_flash_func(setup_psram)(/*uint32_t psram_cs_pin*/) { // Set the PSRAM CS pin in the SDK gpio_set_function(RP2350_PSRAM_CS, GPIO_FUNC_XIP_CS1); // start with zero size psram_size = get_psram_size(); // No PSRAM - no dice if (psram_size == 0) { return; } uint32_t intr_stash = save_and_disable_interrupts(); // Enable quad mode. qmi_hw->direct_csr = 30 << QMI_DIRECT_CSR_CLKDIV_LSB | QMI_DIRECT_CSR_EN_BITS; // Need to poll for the cooldown on the last XIP transfer to expire // (via direct-mode BUSY flag) before it is safe to perform the first // direct-mode operation while ((qmi_hw->direct_csr & QMI_DIRECT_CSR_BUSY_BITS) != 0) { } // RESETEN, RESET and quad enable for (uint8_t i = 0; i < 3; i++) { qmi_hw->direct_csr |= QMI_DIRECT_CSR_ASSERT_CS1N_BITS; if (i == 0) { qmi_hw->direct_tx = PSRAM_CMD_RSTEN; } else if (i == 1) { qmi_hw->direct_tx = PSRAM_CMD_RST; } else { qmi_hw->direct_tx = PSRAM_CMD_QUAD_ENABLE; } while ((qmi_hw->direct_csr & QMI_DIRECT_CSR_BUSY_BITS) != 0) { } qmi_hw->direct_csr &= ~(QMI_DIRECT_CSR_ASSERT_CS1N_BITS); for (size_t j = 0; j < 20; j++) { asm("nop"); } (void)qmi_hw->direct_rx; } // Disable direct csr. qmi_hw->direct_csr &= ~(QMI_DIRECT_CSR_ASSERT_CS1N_BITS | QMI_DIRECT_CSR_EN_BITS); // check our interrupts and setup the timing restore_interrupts(intr_stash); set_psram_timing(); // and now stash interrupts again intr_stash = save_and_disable_interrupts(); qmi_hw->m[1].rfmt = (QMI_M1_RFMT_PREFIX_WIDTH_VALUE_Q << QMI_M1_RFMT_PREFIX_WIDTH_LSB | QMI_M1_RFMT_ADDR_WIDTH_VALUE_Q << QMI_M1_RFMT_ADDR_WIDTH_LSB | QMI_M1_RFMT_SUFFIX_WIDTH_VALUE_Q << QMI_M1_RFMT_SUFFIX_WIDTH_LSB | QMI_M1_RFMT_DUMMY_WIDTH_VALUE_Q << QMI_M1_RFMT_DUMMY_WIDTH_LSB | QMI_M1_RFMT_DUMMY_LEN_VALUE_24 << QMI_M1_RFMT_DUMMY_LEN_LSB | QMI_M1_RFMT_DATA_WIDTH_VALUE_Q << QMI_M1_RFMT_DATA_WIDTH_LSB | QMI_M1_RFMT_PREFIX_LEN_VALUE_8 << QMI_M1_RFMT_PREFIX_LEN_LSB | QMI_M1_RFMT_SUFFIX_LEN_VALUE_NONE << QMI_M1_RFMT_SUFFIX_LEN_LSB); qmi_hw->m[1].rcmd = PSRAM_CMD_QUAD_READ << QMI_M1_RCMD_PREFIX_LSB | 0 << QMI_M1_RCMD_SUFFIX_LSB; qmi_hw->m[1].wfmt = (QMI_M1_WFMT_PREFIX_WIDTH_VALUE_Q << QMI_M1_WFMT_PREFIX_WIDTH_LSB | QMI_M1_WFMT_ADDR_WIDTH_VALUE_Q << QMI_M1_WFMT_ADDR_WIDTH_LSB | QMI_M1_WFMT_SUFFIX_WIDTH_VALUE_Q << QMI_M1_WFMT_SUFFIX_WIDTH_LSB | QMI_M1_WFMT_DUMMY_WIDTH_VALUE_Q << QMI_M1_WFMT_DUMMY_WIDTH_LSB | QMI_M1_WFMT_DUMMY_LEN_VALUE_NONE << QMI_M1_WFMT_DUMMY_LEN_LSB | QMI_M1_WFMT_DATA_WIDTH_VALUE_Q << QMI_M1_WFMT_DATA_WIDTH_LSB | QMI_M1_WFMT_PREFIX_LEN_VALUE_8 << QMI_M1_WFMT_PREFIX_LEN_LSB | QMI_M1_WFMT_SUFFIX_LEN_VALUE_NONE << QMI_M1_WFMT_SUFFIX_LEN_LSB); qmi_hw->m[1].wcmd = PSRAM_CMD_QUAD_WRITE << QMI_M1_WCMD_PREFIX_LSB | 0 << QMI_M1_WCMD_SUFFIX_LSB; // Mark that we can write to PSRAM. xip_ctrl_hw->ctrl |= XIP_CTRL_WRITABLE_M1_BITS; restore_interrupts(intr_stash);}// -------------------------- interesting code below... ------------------------------void __no_inline_not_in_flash_func(flush)() { for (volatile uint8_t* cache = (volatile uint8_t*)0x18000001; cache < (volatile uint8_t*)(0x18000001 + 2048 * 8); cache += 8) { *cache = 0; }}uint32_t *p = (uint32_t *)0x11000000;const size_t cnt = 2048 * 1024; // 100;void fill(bool rev) { if (rev) { for (int i = 0; i < cnt; i++) { p[i] = 2048 * 1024 - i; } } else { for (int i = 0; i < cnt; i++) { p[i] = i; } }}void check(bool rev) { for (int i = 0; i < cnt; i++) { if (rev) { if (p[i] != cnt - i) { printf("mismatch %d = %08x\n", i, p[i]); } } else { if (p[i] != i) { printf("mismatch %d = %08x\n", i, p[i]); } } }}int main() { stdio_init_all(); sleep_ms(5000); setup_psram(); printf("PSRAM: %d\n-----\n", psram_size); printf("Filling fwd..."); fill(false); printf("Checking fwd..."); check(false); printf("Flushing..."); flush(); printf("...done\n"); printf("Filling rev..."); fill(true); printf("Flushing..."); flush(); printf("Checking rev..."); check(true); printf("...done\n"); return 0;}
Code:
set(PROJECT main)set(PICO_BOARD solderparty_rp2350_stamp_xl) # Pico2 sets to RP2350A which disables all code for RP2350Bset(PICO_PLATFORM rp2350)set(PICO_CYW43_SUPPORTED 0)set(CMAKE_BUILD_TYPE Release)set(CMAKE_CXX_FLAGS_DEBUG "-g -O0")set(CMAKE_CXX_FLAGS_RELEASE "-g -O0")cmake_minimum_required(VERSION 3.12)include($ENV{PICO_SDK_PATH}/external/pico_sdk_import.cmake)project(${PROJECT} C CXX ASM)pico_sdk_init()add_executable(${PROJECT} main.cpp)target_link_libraries(${PROJECT} PRIVATE pico_stdlib)pico_add_extra_outputs(${PROJECT})pico_enable_stdio_usb(${PROJECT} 1)pico_enable_stdio_uart(${PROJECT} 0)pico_enable_stdio_rtt(${PROJECT} 0)
Statistics: Posted by earlephilhower — Mon Oct 21, 2024 9:53 pm — Replies 0 — Views 39