The Art of Multiprocessor Programming

Author: Maurice Herlihy, Nir Shavit
4.0
All Stack Overflow 18
This Year Stack Overflow 2
This Month Stack Overflow 1

Comments

by anonymous   2017-08-20

Lock-free skip lists are described in the book The Art of Multiprocessor Programming, and the technical report Practical lock-freedom, which is based on a PhD thesis on the subject. The skip list discussion begins on page 53. An example implementation, based on these sources, is included in this google code project.

There are related discussions, links to literature and implementations (not necessarily lock-free) in the SO questions Skip List vs. Binary Tree, and Skip Lists - ever used them?.

by anonymous   2017-08-20

my own question, with answer, on stackoverflow's sister site: https://softwareengineering.stackexchange.com/questions/126986/where-can-i-find-an-overview-of-known-multithreading-design-patterns/126993#126993

I will copy the answer to avoid the need for click-through:

Quote Boris:

Parallel Programming with Microsoft .NET: Design Patterns for Decomposition and Coordination on Multicore Architectures https://toptalkedbooks.com/amzn/0735651590

This is a book, I recommend wholeheartedly.

It is:

New - published last year. Means you are not reading somewhat outdated practices.

Short - about 200+ pages, dense with information. These days there is too much to read and too little time to read 1000+ pages books.

Easy to read - not only it is very well written but it introduces hard to grasps concepts in really simple to read way.

Intended to teach - each chapter gives exercises to do. I know it is always beneficial to do these, but rarely do. This book gives very compelling and interesting tasks. Surprisingly I did most of them and enjoyed doing them.

additionally, if you wish to learn more of the low-level details, this is the best resource i have found: "The Art of Multiprocessor Programming" It's written using java as their code samples, which plays nicely with my C# background.

PS: I have about 5 years "hard core" parallel programming experience, (abet using C#) so hope you can trust me when I say that "The Art of Multiprocessor Programming" rocks

by anonymous   2017-08-20

Reads and writes to a shared memory location take a finite period of time, so they may either overlap, or be completely distinct.

e.g.

Thread 1:      wwwww     wwwww
Thread 2:   rrrrr              rrrrr
Thread 3:   rrrrr rrrrr

The first read from thread 2 overlaps with the first write from thread 1, whilst the second read and second write do not overlap. In thread 3, both reads overlap the first write.

A safe register is only safe as far as reads that do not overlap writes. If a read does not overlap any writes then it must read the value written by the most recent write. Otherwise it may return any value that the register may hold. So, in thread 2, the second read must return the value written by the second write, but the first read can return any valid value.

A regular register adds the additional guarantee that if a read overlaps with a write then it will either read the old value or the new one, but multiple reads that overlap the write do not have to agree on which, and the value may appear to "flicker" back and forth. This means that two reads from the same thread (such as in thread 3 above) that both overlap the write may appear "out of order": the earlier read returning the new value, and the later returning the old value.

An atomic register guarantees that the reads and writes appears to happen at a single point in time. Readers that act at a point before that point will all read the old value and readers that act after that point will all read the new value. In particular, if two reads from the same thread overlap a write then the later read cannot return the old value if the earlier read returns the new one. Atomic registers are linearizable.

The Art of Multiprocessor Programming by Maurice Herlihy and Nir Shavit gives a good description, along with examples and use cases.

by anonymous   2017-08-20

In general, you have two options: Thread-based parallelism or Task-based parallelism. The first is the most traditional approach and pthreads and OpenMP are a good examples of it. In the second alternative, you have one more abstraction level, where you see your parallel program as a set of tasks that are mapped to threads. A good reference to learn the model of computation is the chapter 27 of Introduction to Algorithms of Cormen (http://mitpress.mit.edu/sites/default/files/titles/content/9780262033848_sch_0001.pdf) and some tools to program are CilkPlus (http://cilkplus.org/), Threading Building Blocks (http://threadingbuildingblocks.org/), OpenMP Tasks(http://openmp.org/wp/), and Microsoft’s Task Parallel Library (http://msdn.microsoft.com/en-us/library/dd460717.aspx).

Finally, you can read The Art of Multiprocessor Programming (http://www.amazon.com/The-Multiprocessor-Programming-Maurice-Herlihy/dp/0123705916)

by anonymous   2017-08-20

The following is taken directly from The Art of Multiprocessor Programming which is a good book to learn about this stuff. There's actually 2 implementations presented: a simple version and a fair version. I'll go ahead and reproduce the fair version.

One of the requirements for this implementation is that you have a condition variable primitive. I'll try to figure out a way to remove it but that might take me a little while. Until then, this should still be better than nothing. Note that it's also possible to implement this primitive using only locks.

public class FifoReadWriteLock {
    int readAcquires = 0, readReleases = 0;
    boolean writer = false;
    ReentrantLock lock;
    Condition condition = lock.newCondition(); // This is the condition variable.

    void readLock () {
        lock.lock();
        try {
            while(writer)
                condition.await();
            readAcquires++;
        }
        finally {
            lock.unlock();
        }
    }

    void readUnlock () {
        lock.lock();
        try {
            readReleases++;
            if (readAcquires == readReleases)
                condition.signalAll();
        }
        finally {
            lock.unlock();
        }
    }

    void writeLock () {
        lock.lock();
        try {
            while (writer)
                condition.await();

            writer = true;

            while (readAcquires != readReleases)
                condition.await();
        }
        finally {
            lock.unlock();
        }
    }

    void writeUnlock() {
        writer = false;
        condition.signalAll();
    }
}

First off, I simplified the code a little but the algorithm remains the same. There also happens to be an error in the book for this algorithm which is corrected in the errata. If you plan on reading the book, keep the errata close by or you'll end up being very confused (like me a few minutes ago when I was trying to re-understand the algorithm). Note that on the bright side, this is a good thing since it keeps you on your toes and that's a requirement when you're dealing with concurrency.

Next, while this may be a Java implementation, only use it as pseudo code. When doing the actual implementation you'll have to be carefull about the memory model of the language or you'll definitely end up with a headache. As an example, I think that the readAcquires and readReleases and writer variable all have to be declared as volatile in Java or the compiler is free to optimize them out of the loops. This is because in a strictly sequential programs there's no point in continuously looping on a variable that is never changed inside the loop. Note that my Java is a little rusty so I might be wrong. There's also another issue with integer overflow of the readReleases and readAcquires variables which is ignored in the algorithm.

One last note before I explain the algorithm. The condition variable is initialized using the lock. That means that when a thread calls condition.await(), it gives up its ownership of the lock. Once it's woken up by a call to condition.signalAll() the thread will resume once it has reacquired the lock.

Finally, here's how and why it works. The readReleases and readAcquires variables keep track of the number threads that have acquired and released the read lock. When these are equal, no thread has the read lock. The writer variable indicates that a thread is trying to acquire the write lock or it already has it.

The read lock part of the algorithm is fairly simple. When trying to lock, it first checks to see if a writer is holding the lock or is trying to acquire it. If so, it waits until the writer is done and then claims the lock for the readers by incrementing the readAcquires variable. When unlocking, a thread increases the readReleases variable and if there's no more readers, it notifies any writers that may be waiting.

The write lock part of the algorithm isn't much more complicated. To lock, a thread must first check whether any other writer is active. If they are, it has to wait until the other writer is done. It then indicates that it wants the lock by setting writer to true (note that it doesn't hold it yet). It then waits until there's no more readers before continuing. To unlock, it simply sets the variable writer to false and notifies any other threads that might be waiting.

This algorithm is fair because the readers can't block a writer indefinitely. Once a writer indicates that it wants to acquire the lock, no more readers can acquire the lock. After that the writer simply waits for the last remaining readers to finish up before continuing. Note that there's still the possibility of a writer indefinitely blocking another writer. That's a fairly rare case but the algorithm could be improved to take that into account.


So I re-read your question and realised that I partly (badly) answered it with the algorithm presented below. So here's my second attempt.

The algorithm, you described is fairly similar to the simple version presented in the book I mentionned. The only problem is that A) it's not fair and B) I'm not sure how you would implement wait until recursive mutex has lock count zero. For A), see above and for B), the book uses a single int to keep track of the readers and a condition variable to do the signalling.

by kmavm   2017-08-19
If you are into wait-free and lock-free data structures, you owe it to yourself to get Herlihy and Shavit's "Art of Multiprocessor Programming."

http://www.amazon.com/Art-Multiprocessor-Programming-Maurice...

Not only does it provide a well-debugged, well-written collection of wait-free data structures; it teaches you to think about concurrency in a structured way. The mutual exclusion chapter doesn't teach you which pthread routines to use, but instead tells you how to implement mutual exclusion, and the trade-offs inherent in, e.g., spin locks vs. queueing locks. A significant bonus is Maurice Herlihy's distinctive prose voice: correct, concise, and funny in that order. It is the best computer book I've read recently, and the one I recommend to all colleagues who are trying to take their concurrency chops up to the next level.

(I took a course from Herlihy in '99, but am otherwise unaffiliated.)

by gtani   2017-08-19
Since then, Real world haskell (freely available), Hutton's book, maybe another came out

http://book.realworldhaskell.org/

http://www.amazon.com/Programming-in-Haskell-ebook/dp/B001FS...

also: Herlihy , Shavit , Multiprocessor Programming

http://www.amazon.com/Art-Multiprocessor-Programming-Maurice...

by strlen   2017-08-19
There are rather basic (producer/consume queues) and as some have pointed out, are buggy. I'd highly suggest looking at Boost's threading library (for an example of an object oriented approach to threading, taking advantage of RAII -- much of it is now standard in C++11), Intel's Thread Building Blocks, Java's built in Concurrency Utilities (java.util.concurrent package), Doug Lea's Fork Join Framework. The a great book on the subject is Maurice Herlihy's The Art of Multiprocessor Programming:

http://www.amazon.com/The-Multiprocessor-Programming-Maurice...

The book is in Java, but C++11 has the required primitives (cross platform compare and set for integral types, a defined memory model) so you could follow allong in C++11.