Also, some good resources on how CPUs work, with perspective to programmers, can be found from the Intel technical library. Everything is free to download there and it makes for some good reading!
Ultra-low-latency programming is hard. Much harder than people suspect when they first start down the path. There are some techniques and "tricks" you can employ. Like IO Completion ports, multi core utilization, highly optimized synchronization techniques, shared memory. The list goes on forever. (edit) It's not as simple as "code-profile-refactor-repeat" because you can write excellent code that is robust and fast, but will never be truly ultra-low latency code.
Unfortunately there is no one single resource I know of that will show you how it's done. Programmers specializing in (and good at) ultra low-latency code are among the best in the business and the most experienced. And with good reason. Because if there is a silver bullet solution to becoming a good low-latency programmer, it is simply this: you have to know a lot about everything. And that knowledge is not easy to come by. It takes years (decades?) of experience and constant study.
As far as the study itself is concerned, here's a few books I found useful or especially insightful for one reason or another:
I don't know if you had a specific OS in mind, but one of the best books on how the Windows operating system works "under the hood" is called Windows Internals. It describes in detail how everything from the kernel, to device drivers, and the file system all work.
If your looking for a good book on how CPUs and processors work, in general, I recommend Computer Architecture: A Quantitative approach. Very good info there!
Also, some good resources on how CPUs work, with perspective to programmers, can be found from the Intel technical library. Everything is free to download there and it makes for some good reading!
[C++ programmer]:
Ultra-low-latency programming is hard. Much harder than people suspect when they first start down the path. There are some techniques and "tricks" you can employ. Like IO Completion ports, multi core utilization, highly optimized synchronization techniques, shared memory. The list goes on forever. (edit) It's not as simple as "code-profile-refactor-repeat" because you can write excellent code that is robust and fast, but will never be truly ultra-low latency code.
Unfortunately there is no one single resource I know of that will show you how it's done. Programmers specializing in (and good at) ultra low-latency code are among the best in the business and the most experienced. And with good reason. Because if there is a silver bullet solution to becoming a good low-latency programmer, it is simply this: you have to know a lot about everything. And that knowledge is not easy to come by. It takes years (decades?) of experience and constant study.
As far as the study itself is concerned, here's a few books I found useful or especially insightful for one reason or another:
Should you ever crave deeper understanding, I heartily recommend Patterson and Hennessy as an intro and Hennessy and Patterson as an intermediate to advanced text. They're pricey, but truly non-pareil; I just wish either or both were available when I got my Masters' degree and entered the workforce designing chips, systems, and parts of system software for them (but, alas!, that was WAY too long ago;-). Stack pointers are so crucial (and the distinction between a microprocessor and any other kind of CPU so utterly meaningful in this context... or, for that matter, in ANY other context, in the last few decades...!-) that I doubt anything but a couple of thorough from-the-ground-up refreshers can help!-)