Download the source of coreutils from http://www.gnu.org/software/coreutils/ and get started.
Get the source of bash http://ftp.gnu.org/gnu/bash/ (or other shell) and you can read the bash source.
You may want to read any Linux system programming book to read about the system calls API and know how to use them. Here is a link to unix.stackexchange:
What is the best book to learn Linux system programming?
The working of the Linux OS and working of the commands are different things. If you already know OS basics then you can try reading Understanding The Linux Kernels by Daniel Bovet. Else I think you might want to first read a standard OS book by Galvin, Tanenbaum or Deitel or any other book.
First things first. Read, read, read, read, read. You need to have a firm understanding of how the OS works before you can hope to implement your own.
Grab one of Andrew Tanenbaum's books on operating systems. This is the one we used in my OS class in college:
Modern Operating Systems http://ecx.images-amazon.com/images/I/51DptFJH9NL._SL500_AA240_.jpg
Modern Operating Systems on Amazon
Despite the ridiculous cover, it's a fantastic read, especially for a textbook. Tanenbaum is really an expert in this area and his explanations of how the OS works underneath the hood are clear and easy to understand. This book is mostly theory, but I believe he also has a book that discusses more of the implementation. I've never read it, though, so I can't comment on it.
That should help you bone up on process management, memory management, filesystems, and everything else your OS kernel needs to do to get it up to a bootable state. From that point on it's basically a matter of writing device drivers for the hardware you need to support, and offering implementations of the C library functions to make kernel calls for things like opening files and devices, reading and writing, passing messages between processes, etc.
Read up on x86 assembly (assuming you are designing this for an x86 machine). That should answer a lot of your questions with regards to moving between processor operating modes.
If you've got any electronics knowledge, it may be easier to start with writing an operating system for an embedded device that has ample documentation, because it will generally be simpler than an x86 PC. I've always wanted to write my own OS as well, and I'm starting with writing a microkernel embedded OS for this development board from Digilent. It can run the soft-core MicroBlaze processor from Xilinx, which has very thorough documentation. It's also got some RAM, flash data storage, LEDs, switches, buttons, VGA output, etc. Plenty of stuff to play around with writing simple drivers for.
One of the benefits of an embedded device is also that you may be able to avoid writing a VGA driver for a long time. In my case, the Digilent development board has an onboard UART, so I can effectively use the serial output as my console to get the whole thing up and booting to a command line with minimal fuss.
Just make sure that whatever you choose to target has a readily available and well-tested compiler for it. You do not want to be writing an OS and a compiler at the same time.
Really, it's just the same as any concurrency problem: you've got multiple threads of control, and it's indeterminate which statements on which threads get executed when. That means there are a large number of POTENTIAL execution paths through the program, and your program must be correct under all of them.
In general the place where trouble can occur is when state is shared among the threads (aka "lightweight processes" in the old days.) That happens when there are shared memory areas,
To ensure correctness, what you need to do is ensure that these data areas get updated in a way that can't cause errors. To do this, you need to identify "critical sections" of the program, where sequential operation must be guaranteed. Those can be as little as a single instruction or line of code; if the language and architecture ensure that these are atomic, that is, can't be interrupted, then you're golden.
Otherwise, you idnetify that section, and put some kind of guards onto it. The classic way is to use a semaphore, which is an atomic statement that only allows one thread of control past at a time. These were invented by Edsgar Dijkstra, and so have names that come from the Dutch, P and V. When you come to a P, only one thread can proceed; all other threads are queued and waiting until the executing thread comes to the associated V operation.
Because these primitives are a little primitive, and because the Dutch names aren't very intuitive, there have been some ther larger-scale approaches developed.
Per Brinch-Hansen invented the monitor, which is basically just a data structure that has operations which are guaranteed atomic; they can be implemented with semaphores. Monitors are pretty much what Java synchronized statements are based on; they make an object or code block have that particular behavir -- that is, only one thread can be "in" them at a time -- with simpler syntax.
There are other modeals possible. Haskell and Erlang solve the problem by being functional languages that never allow a variable to be modified once it's created; this means they naturally don't need to wory about synchronization. Some new languages, like Clojure, instead have a structure called "transactional memory", which basically means that when there is an assignment, you're guaranteed the assignment is atomic and reversible.
So that's it in a nutshell. To really learn about it, the best places to look at Operating Systems texts, like, eg, Andy Tannenbaum's text.
Writing a high performance HTTP crawler / downloader is no easy task. I'll describe, how I'd do it. Keep in mind, there are a lot of solutions, so if you want to dive deeper into this topic, you may want to read Modern Operating Systems by Andrew S. Tanenbaum.
SELECT... FOR UPDATE
This is a relatively safe path to travel. However, keep in mind that you are playing with something way beyond the knowledge level of an average (or even experienced) PHP coder. You must read up on how the Linux (or Windows) process model works, otherwise you will have an incredibly broken application at your hands.