I'm attempting to update my TensorFlow Lite Micro code for the RP2040 by adapting the latest CMSIS-NN code to split a matrix multiply across both cores. I've done this successfully with earlier versions of CMSIS-NN and pretty much doubled the speed. https://petewarden.com/2023/07/29/accel ... ual-cores/
Unfortunately I've been running into very hard to debug errors. I've narrowed it down to an issue with memory coherence between different cores, and put together a minimal repro case at https://github.com/petewarden/pico_mult ... rence_test
From what I've been able to research, this may be related to the use of scratch memory banks for the stack, but I haven't been able to find any documentation on what the effects, limitations, and recommended approaches to ensure that a function running on the second core can read and write memory with guaranteed coherence.
Simply put, how could I rewrite the write_coords_row_wise() function so that it is reliable? Is there some kind of "flush memory" command I can use to ensure coherence?
Thanks for any ideas!
Unfortunately I've been running into very hard to debug errors. I've narrowed it down to an issue with memory coherence between different cores, and put together a minimal repro case at https://github.com/petewarden/pico_mult ... rence_test
From what I've been able to research, this may be related to the use of scratch memory banks for the stack, but I haven't been able to find any documentation on what the effects, limitations, and recommended approaches to ensure that a function running on the second core can read and write memory with guaranteed coherence.
Simply put, how could I rewrite the write_coords_row_wise() function so that it is reliable? Is there some kind of "flush memory" command I can use to ensure coherence?
Thanks for any ideas!
Statistics: Posted by petewarden — Mon Jan 01, 2024 12:30 am