Hello. I am trying to make a function generator by driving Raspberry Pi 3 Model B + AD9744 DAC to simulate the behavior of the ccd image sensor (1Vpp sine wave). My DAC needs to spew out new analog value every 34 ns, aka must work at 29.41MHz. Currently my board runs at 1200MHz arm, 472MHz core (randomly got a nice value to divide by 16 and get the mentioned 29.41MHz), I believe throttling is disabled with force_turbo=1. It is very simple code-wise, all I am doing is lowering the clock (pin 26) on every even count and updating the DAC inputs, then rising the clock on the odd count. Obviously separated SET and CLR registers are not in favor of speed in my situation. It has been an exciting journey from 120 ns polled interrupt in a user program on an isolated core to 45 ns in an optimized FIQ assembly routine.
IRQ routine in c:There is a minimum 2ns tsetup that I must wait so rising the clock and updating the DAC inputs must be split into two separate GPIO writes.
Here is how I put it in FIQ (removed branches where possible, although the even/odd branch I can't really come up with how to get rid of it, using banked r8-r13 as much as possible, no stack usage). Also I was able to put the whole 360-entry sine table as constants at the end of the FIQ. Please note, I do know that makin it a sin(1.4025x) and having a 256-entry sine table would save me a few instructions, but it seems that the bulk of the 45 ns is loads and stores (clear interrupt, lower the clock, write to DAC on low edge; clear interrupt, rise the clock, load the next sine value on high edge). So 3 writes on high edge, 2 writes+1load on low edge. And one branch. Can it be optimized any further?
r8=ioremapped gpio
r9=ioremapped timer
r10=next_sine
r11=count
r12=scratchpad
r13=sine_indexgithub.com/mbondaru/armtimer/tree/main
There's also a CMSIS-compliant (sort of) header that you can use if you like uController style of coding, TI, Silabs, etc.
IRQ routine in c:
Code:
void ARMTimer_IRQ(void){ if(count %2) { P1->GPCLR0 = 0x040000FF; } else { P1->GPSET0 = sine_comms[sine_index]; sine_index = (sine_index + 1) % SINE_TABLE_SIZE; P1->GPSET0 = 0x04000000; } count++; ARMTIMER->IRQ_CLR_ACK = 1;}
Here is how I put it in FIQ (removed branches where possible, although the even/odd branch I can't really come up with how to get rid of it, using banked r8-r13 as much as possible, no stack usage). Also I was able to put the whole 360-entry sine table as constants at the end of the FIQ. Please note, I do know that makin it a sin(1.4025x) and having a 256-entry sine table would save me a few instructions, but it seems that the bulk of the 45 ns is loads and stores (clear interrupt, lower the clock, write to DAC on low edge; clear interrupt, rise the clock, load the next sine value on high edge). So 3 writes on high edge, 2 writes+1load on low edge. And one branch. Can it be optimized any further?
r8=ioremapped gpio
r9=ioremapped timer
r10=next_sine
r11=count
r12=scratchpad
r13=sine_index
Code:
.text.global sp804_handler.global sp804_handler_endsp804_handler: ands r12, r11, #1 mov r12, #0x04000000 str r12, [r9, #0x0C] beq CountEven add r12, r12, 0xFF str r12, [r8, #0x28] add r11, r11, #1 str r10, [r8, #0x1C] mov r12, #0x100 add r12, r12, #0x67 cmp r13, r12 addlo r13, r13, #1 subeq r13, r13, r12 subs pc, lr, #4CountEven: str r12, [r8, #0x1C] add r11, r11, #1 adr r12, SineTable add r12, r12, r13 ldrb r10, [r12] subs pc, lr, #4SineTable:.word 0x28282726....word 0x26252424sp804_handler_end:
There's also a CMSIS-compliant (sort of) header that you can use if you like uController style of coding, TI, Silabs, etc.
Statistics: Posted by MaximBondaruk — Fri Mar 29, 2024 7:46 pm