Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 5103

SDK • Re: How can I profile basic functions for the RP2040?

$
0
0
I know the following code compiles down to machine code which is 64 instructions shorter than the previous version. At a maximum this would be able to give you a 20% speed improvement. I don't know if it will work though as I might have broken something in the code during my optimization but I tried not to:

Code:

void __no_inline_not_in_flash_func(drawSprites)(uint16_t screenWidth, uint16_t screenHeight, uint16_t raster_y,                                   uint16_t pixels[screenWidth + BUFFER_PADDING]) {    // Can delete this line since uint16 can never be less than 0.    // Compiler figured this out and just removed it as dead code.    // if (raster_y < 0) return;    if (raster_y >= screenHeight) return;    for (int s = 0; s < NUMBER_OF_SPRITES; s++) {        if (!sprites[s].visible) continue;        if (raster_y >= sprites[s].y && raster_y < (sprites[s].y + sprites[s].height)) {            int offset = (raster_y - sprites[s].y);            uint32_t line = sprites[s].frame[offset];            // Pre-calculate the highest pixel address (rightmost?) in pixels.            // Can just decrement this pointer rather than recalculating for each of the 16 pixels.            uint16_t* pixel = &pixels[sprites[s].x + 15];            // Pre-calculate the palette which is appropriate for this sprite.            // Previously it would be recalculated for each of the 16 pixels.            SpritePalette* palette = &sprite_palettes[sprites[s].palette];            // Perform the color extraction from line and the pixel update for each pixel at the same time to            // reduce register pressure. This saves extra memory reads/writes to recover pixel color values from the            // stack.            // I updated them in reverse order to simplify the calculation down to right shift by 2-bits and masking off            // the lower 2-bits (same operation for each pixel).            uint8_t cf = line & 0b11;            if (cf) { *pixel = palette->color[cf]; }            line >>= 2; pixel--;            uint8_t ce = line & 0b11;            if (ce) { *pixel = palette->color[ce]; }            line >>= 2; pixel--;            uint8_t cd = line & 0b11;            if (cd) { *pixel = palette->color[cd]; }            line >>= 2; pixel--;            uint8_t cc = line & 0b11;            if (cc) { *pixel = palette->color[cc]; }            line >>= 2; pixel--;            uint8_t cb = line & 0b11;            if (cb) { *pixel = palette->color[cb]; }            line >>= 2; pixel--;            uint8_t ca = line & 0b11;            if (ca) { *pixel = palette->color[ca]; }            line >>= 2; pixel--;            uint8_t c9 = line & 0b11;            if (c9) { *pixel = palette->color[c9]; }            line >>= 2; pixel--;            uint8_t c8 = line & 0b11;            if (c8) { *pixel = palette->color[c8]; }            line >>= 2; pixel--;            uint8_t c7 = line & 0b11;            if (c7) { *pixel = palette->color[c7]; }            line >>= 2; pixel--;            uint8_t c6 = line & 0b11;            if (c6) { *pixel = palette->color[c6]; }            line >>= 2; pixel--;            uint8_t c5 = line & 0b11;            if (c5) { *pixel = palette->color[c5]; }            line >>= 2; pixel--;            uint8_t c4 = line & 0b11;            if (c4) { *pixel = palette->color[c4]; }            line >>= 2; pixel--;            uint8_t c3 = line & 0b11;            if (c3) { *pixel = palette->color[c3]; }            line >>= 2; pixel--;            uint8_t c2 = line & 0b11;            if (c2) { *pixel = palette->color[c2]; }            line >>= 2; pixel--;            uint8_t c1 = line & 0b11;            if (c1) { *pixel = palette->color[c1]; }            line >>= 2; pixel--;            uint8_t c0 = line & 0b11;            if (c0) { *pixel = palette->color[c0]; }        }    }}

WOW! That pushed the limit up to 18 sprites per scanline! I just did a super fast test but it seems to have worked. I will see how far this will push it and report back.

I also discovered that my tile drawing routine under the sprites was also using some poor code. But the good news is that the tile drawing is almost identical to the sprite drawing. I will see about applying the same optimization to it as well.

Thank you so much for this! I will study this code and hopefully understand how it's so much faster.

On a related note, how are you looking at the compiled code? Again, I am just using CLion but I think I can change the compiler under it. Are you just looking at the .bin file in a hex viewer or something?

Thanks!!!

Update

It seems the new maximum is 24 sprites before it breaks. Up from 14! That's an additional 10 sprites per scanline.
And, that's before I change the tile drawing routine so it may go even higher than that. :-D

I am dumbfounded how powerful this $1 micro-controller really is and I'm really just using DMA and PIO's.

Statistics: Posted by cbmeeks — Fri Jul 26, 2024 12:16 pm



Viewing all articles
Browse latest Browse all 5103

Trending Articles