Quantcast
Viewing all articles
Browse latest Browse all 4710

General • Surprising performance disparity

Hello,

I've been hacking a bit on Miroslav Nemecek's PicoQVGA (https://www.breatharian.eu/hw/picoqvga/index_en.html)

While adding double-buffering support, I came across a huge performance pitfall from a seemingly trivial code change.

Essentially, the difference is accessing a framebuffer array directly, vs accessing it through a "current framebuffer" pointer.

The frame buffers are declared like so:

Code:

ALIGN4 u8 frame_buff0[FB_SIZE];ALIGN4 u8 frame_buff1[FB_SIZE];
these frame buffers are abstracted behind pointers as the "draw" buffer and the "display" buffer. The idea is that you draw to the draw buffer while DMA / PIO is sending the display buffer to the screen, then the pointers flip at VSYNC (if a vga_flip() call has been made, which sets a "draw buffer is ready" flag).

Code:

u8* draw_buff;u8* display_buff;
In order to access the frame buffers directly, there are also two indexes which are updated on VSYNC:

Code:

u8 draw_idx;u8 display_idx;
Accordingly, in a loop which fills the frame buffer, setting a pixel can be done in two ways: direct array access or access via the "current" pointer:

Code:

            // direct framebuffer access: 306us.            if (draw_idx == 1) {                frame_buff1[pixel] = rgb;            } else {                frame_buff0[pixel] = rgb;            }

Code:

            // pointer to framebuffer: 5467us.            draw_buff[pixel] = rgb;
I'm a bit shocked that one is about 18x faster than the other.

Just to confirm what I was seeing, I encapsulated these two approaches as functions, then toggled between them in a loop, printing out the elapsed time of each in the serial console. Sure enough, the behavior is repeatable:

Code:

main: foo: 306usmain: foo: 5979usmain: foo: 306usmain: foo: 5978usmain: foo: 306usmain: foo: 5978usmain: foo: 306usmain: foo: 5979usmain: foo: 306usmain: foo: 5978usmain: foo: 306usmain: foo: 5978us
I then marked them as NOINLINE and looked at the disassembly, but I can't seem to spot what would make one so much slower than the other.

direct array access:

Code:

NOINLINE void foo_fill1(u8 rgb) {    for (int y=0; y<FB_HEIGHT; y++) {        for (int x=0; x<FB_WIDTH; x++) {            int pixel = ((FB_WIDTH * y) + x);            // direct framebuffer access: 306us.            if (draw_idx == 1) {                frame_buff1[pixel] = rgb;            } else {                frame_buff0[pixel] = rgb;            }        }    }}

Code:

10000850 <foo_fill1>:10000850:b510      push{r4, lr}10000852:4b08      ldrr3, [pc, #32]@ (10000874 <foo_fill1+0x24>)10000854:0001      movsr1, r010000856:781b      ldrbr3, [r3, #0]10000858:2b01      cmpr3, #11000085a:d005      beq.n10000868 <foo_fill1+0x18>1000085c:2296      movsr2, #150@ 0x961000085e:4806      ldrr0, [pc, #24]@ (10000878 <foo_fill1+0x28>)10000860:0252      lslsr2, r2, #910000862:f004 fc83 bl1000516c <__wrap_memset>10000866:bd10      pop{r4, pc}10000868:2296      movsr2, #150@ 0x961000086a:4804      ldrr0, [pc, #16]@ (1000087c <foo_fill1+0x2c>)1000086c:0252      lslsr2, r2, #91000086e:f004 fc7d bl1000516c <__wrap_memset>10000872:e7f8      b.n10000866 <foo_fill1+0x16>10000874:20000f64 .word0x20000f6410000878:20001a14 .word0x20001a141000087c:20014614 .word0x20014614
pointer access:

Code:

NOINLINE void foo_fill2(u8 rgb) {    for (int y=0; y<FB_HEIGHT; y++) {        for (int x=0; x<FB_WIDTH; x++) {            int pixel = ((FB_WIDTH * y) + x);            // pointer to framebuffer: 5467us.            draw_buff[pixel] = rgb;        }    }}

Code:

10000880 <foo_fill2>:10000880:21a0      movsr1, #160@ 0xa010000882:b570      push{r4, r5, r6, lr}10000884:2500      movsr5, #010000886:4c08      ldrr4, [pc, #32]@ (100008a8 <foo_fill2+0x28>)10000888:4e08      ldrr6, [pc, #32]@ (100008ac <foo_fill2+0x2c>)1000088a:0049      lslsr1, r1, #11000088c:002b      movsr3, r51000088e:6822      ldrr2, [r4, #0]10000890:54d0      strbr0, [r2, r3]10000892:3301      addsr3, #110000894:428b      cmpr3, r110000896:d1fa      bne.n1000088e <foo_fill2+0xe>10000898:3341      addsr3, #65@ 0x411000089a:3541      addsr5, #65@ 0x411000089c:33ff      addsr3, #255@ 0xff1000089e:0019      movsr1, r3100008a0:35ff      addsr5, #255@ 0xff100008a2:42b3      cmpr3, r6100008a4:d1f2      bne.n1000088c <foo_fill2+0xc>100008a6:bd70      pop{r4, r5, r6, pc}100008a8:20000f6c .word0x20000f6c100008ac:00012d40 .word0x00012d40
Very curious to hear if anyone has any insight!

Statistics: Posted by cellularmitosis — Sun Jun 30, 2024 5:16 am — Replies 2 — Views 71



Viewing all articles
Browse latest Browse all 4710

Trending Articles