Robert, I don't think that what you want to achieve is really possible to do without actively manipulating file system data structures for a file system which, from the sounds of it, is mounted. I don't think I have to tell you how dangerous and unwise this sort of exercise it.
But if you need to do it, I guess I can give you a "sketch on the back of a napkin" to get you started:
You could leverage the "sparse file" support of NTFS to simply add "gaps" by tweaking the LCN/VCN mappings. Once you do, just open the file, seek to the new location and write your data. NTFS will transparently allocate the space and write the data in the middle of the file, where you created a hole.
For more, look at this page about defragmentation support in NTFS for hints on how you can manipulate things a bit and allow you to insert clusters in the middle of the file. At least by using the sanctioned API for this sort of thing, you are unlikely to corrupt the filesystem beyond repair, although you can still horribly hose your file, I guess.
Get the retrieval pointers for the file that you want, split them where you need, to add as much extra space as you need, and move the file. There's an interesting chapter on this sort of thing in the Russinovich/Ionescu "Windows Internals" book (http://www.amazon.com/Windows%C2%AE-Internals-Including-Windows-Developer/dp/0735625301)
There're a number of tools that can be handy:
Process Explorer from SysInternals is much more useful that the task manager.
Off top of my head, here are a few things that you can do without modifying the code or writing test code:
Play with SysInternals tools.
Also, it may be a good idea to buy this book to familiarize yourself with Windows: http://www.amazon.com/Windows%C2%AE-Internals-Including-Windows-Developer/dp/0735625301/
There're also a few good ones on debugging Windows applications, like this one: http://www.amazon.com/Advanced-Windows-Debugging-Mario-Hewardt/dp/0321374460/
Among the other things it explains how to automatically collect crash dumps from your applications (using Windows Error Reporting AKA WER) and then inspect them in the debugger. I found that useful.
Hard disks are an interesting beast - sequential access (reading a big contiguous file for example) is super zippy, figure 80megabytes/sec. however random access is very slow. this is what you're bumping into - recursing into the folders wont read much (in terms of quantity) data, but will require many random reads. The reason you're seeing zippy perf the second go around is because the MFT is still in RAM (you're correct on the caching thought)
The best mechanism I've seen to achieve this is to scan the MFT yourself. The idea is you read and parse the MFT in one linear pass building the information you need as you go. The end result will be something much closer to 15 seconds on a HD that is very full.
some good reading:
NTFSInfo.exe - http://technet.microsoft.com/en-us/sysinternals/bb897424.aspx
Windows Internals - http://www.amazon.com/Windows%C2%AE-Internals-Including-Windows-PRO-Developer/dp/0735625301/ref=sr_1_1?ie=UTF8&s=books&qid=1277085832&sr=8-1
FWIW: this method is very complicated as there really isn't a great way to do this in Windows (or any OS I'm aware of) - the problem is that the act of figuring out which folders/files are needed requires much head movement on the disk. It'd be very tough for Microsoft to build a general solution to the problem you describe.
You will need a good understanding of Windows Internals:
and yes they have a book: Windows Internals
Basically your questions are all answered in this book (and it even comes with samples and hands-on labs).
I don't know if you had a specific OS in mind, but one of the best books on how the Windows operating system works "under the hood" is called Windows Internals. It describes in detail how everything from the kernel, to device drivers, and the file system all work.
If your looking for a good book on how CPUs and processors work, in general, I recommend Computer Architecture: A Quantitative approach. Very good info there!
Also, some good resources on how CPUs work, with perspective to programmers, can be found from the Intel technical library. Everything is free to download there and it makes for some good reading!
You can force your threads to run on specific cores, but in general you should let the OS take care of it. The operating system handles much of this automatically. If you have four threads running on a quad core system, the OS will schedule them on all four cores unless you take actions to prevent it from happening. The OS will also do things like try to keep an individual thread running on the same core rather than shifting them around for better performance, not schedule two running threads on the same hyperthreaded core if there are idle cores available, and so on.
Also, rather than creating new threads for work you should use the thread pool. The system will scale this to the number of processors available on the system.
Windows Internals is a good book for covering how the Windows scheduler works.
The NT kernel uses event objects to allow signals to transferred to entities that wait on the signal. A mutex and a semaphore are also waitable kernel objects (Kernel Dispatcher Objects), but with different semantics. The only time I ever came across them was when waiting for IO to complete in drivers.
So my theory on your problem is possibly a faulty driver, or are you relying on specialised hardware?
Edit: More info (from Windows Internals 5th Edition - Chapter 3 System Mechanics)
Some Kernel Dispatcher Objects (e.g. mutex, semaphore) have the of concept ownership. So when signalled the released one waiting thread will be released will grab these resources. And others will have to continue to wait. Events are not owned hence are available to be reset by any thread.
Also there are three types of events:
Another interesting thing that I've learned is that critical sections (the lock primitive in c#) are actually not kernel objects, rather they are implemented out of a keyed event, or mutex or semaphore as required.