ARM System Developer's Guide: Designing and Optimizing System Software (The Morgan Kaufmann Series in Computer Architecture and Design)

Category: Hardware & DIY
Author: Chris Wright
4.2
This Month Stack Overflow 2

Comments

by anonymous   2019-07-21

Your problem is two-fold: understand baremetal and OS programming and understand the beagleboard hardware. For the latter, I recommend looking at other peoples code alongside the datasheets. Reading the datasheats only is very time consuming. Start with u-boot code for the beagleboard:

  • https://github.com/beagleboard

Some other baremetal projects that are not BB-XM but I have found useful:

  • https://github.com/mrd/puppy (original beagle)
  • https://github.com/wayling/xboot-clone (no Cortex-A8)

Your second problem is to understand low-level programming on ARM. I recommend these books, note however that these are written for older architectures. Nevertheless, they should still be very useful to you:

  • http://www.hitex.com/index.php?id=download-insiders-guides (free)
  • http://www.amazon.com/ARM-System-Developers-Guide-Architecture/dp/1558608745/ (the ARM bible)

The latter even has a chapter on writing your own small OS.

by anonymous   2019-01-13

Not exactly answering your question, but I see you aspire for fast execution of the loops.

Here are some tips from the book: 'ARM System Developer's Guide: Designing and Optimizing System Software (The Morgan Kaufmann Series in Computer Architecture and Design)' http://www.amazon.com/ARM-System-Developers-Guide-Architecture/dp/1558608745

Chapter 5 contains section named 'C looping structures'. Here is the summary of the section:

Writing Loops Efficiently

  • Use loops that count down to zero. Then the compiler does not need to allocate a register to hold the termination value, and the comparison with zero is free.
  • Use unsigned loop counters by default and the continuation condition i!=0 rather than i>0. This will ensure that the loop overhead is only two instructions.
  • Use do-while loops rather than for loops when you know the loop will iterate at least once. This saves the compiler checking to see if the loop count is zero.
  • Unroll important loops to reduce the loop overhead. Do not overunroll. If the loop overhead is small as a proportion of the total, then unrolling will increase code size and hurt the performance of the cache.
  • Try to arrange that the number of elements in arrays are multiples of four or eight. You can then unroll loops easily by two, four, or eight times without worrying about the leftover array elements.

Based on the summary, your inner loop might look as below.

uinsigned int i = 240/4;  // Use unsigned loop counters by default
                          // and the continuation condition i!=0

do
{
    // Unroll important loops to reduce the loop overhead
    LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
    LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
    LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
    LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
}
while ( i != 0 )  // Use do-while loops rather than for
                  // loops when you know the loop will
                  // iterate at least once

You might want to experiment also with 'pragmas' as well, e.g. :

#pragma Otime

http://www.keil.com/support/man/docs/armcc/armcc_chr1359124989673.htm

#pragma unroll(n)

http://www.keil.com/support/man/docs/armcc/armcc_chr1359124992247.htm

Maybe not everything may be applicable in your application (filling a buffer in reverse order). I just wanted to draw your attention to the book and possible points for optimization.