I don't know if you had a specific OS in mind, but one of the best books on how the Windows operating system works "under the hood" is called Windows Internals. It describes in detail how everything from the kernel, to device drivers, and the file system all work.
If your looking for a good book on how CPUs and processors work, in general, I recommend Computer Architecture: A Quantitative approach. Very good info there!
Also, some good resources on how CPUs work, with perspective to programmers, can be found from the Intel technical library. Everything is free to download there and it makes for some good reading!
You can force your threads to run on specific cores, but in general you should let the OS take care of it. The operating system handles much of this automatically. If you have four threads running on a quad core system, the OS will schedule them on all four cores unless you take actions to prevent it from happening. The OS will also do things like try to keep an individual thread running on the same core rather than shifting them around for better performance, not schedule two running threads on the same hyperthreaded core if there are idle cores available, and so on.
Also, rather than creating new threads for work you should use the thread pool. The system will scale this to the number of processors available on the system.
Windows Internals is a good book for covering how the Windows scheduler works.
The NT kernel uses event objects to allow signals to transferred to entities that wait on the signal. A mutex and a semaphore are also waitable kernel objects (Kernel Dispatcher Objects), but with different semantics. The only time I ever came across them was when waiting for IO to complete in drivers.
So my theory on your problem is possibly a faulty driver, or are you relying on specialised hardware?
Edit: More info (from Windows Internals 5th Edition - Chapter 3 System Mechanics)
Some Kernel Dispatcher Objects (e.g. mutex, semaphore) have the of concept ownership. So when signalled the released one waiting thread will be released will grab these resources. And others will have to continue to wait. Events are not owned hence are available to be reset by any thread.
Also there are three types of events:
Another interesting thing that I've learned is that critical sections (the lock primitive in c#) are actually not kernel objects, rather they are implemented out of a keyed event, or mutex or semaphore as required.