Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions
Some of the things you describe are not errors or exceptions, but alternative flows that you should consider in your distributed architecture.
For example, that an item is out of stock is a perfectly valid alternative flow in your business process. One that possibly requires human intervention. You could move the message to a separate queue and provide some UI where a human operator can deal with the problem, solve it and cause the flow of events to continue.
A similar thing could be said of the payment problems you describe. If an order cannot successfully be settled, a human operator will need to investigate the case and solve it. For that matter, your design must contemplate that alternative flow as part of it, and make it so a human can intervene somehow when the messages end up in a queue that requires a person to review them.
Those cases should be differentiated from errors or exceptions being thrown by the program. Those cases, depending on the circumstance, might in fact require to move the message to a dead letter queue (DLQ) for an engineer to take a look at them.
This is a very broad topic and entire books could written about this.
I believe you could probably benefit from gaining more understanding of concepts like:
The idea behind compensating transactions is that every ying has its yang: if you have one transaction that can place an order, then you could undo that with a transaction that cancels that order. This latter transaction is a compensating transaction. So, if you carry out a number of successful transactions and then one of them fails, you can trace back your steps and compensate every successful transaction you did and, as a result, revert their side effects.
I particularly liked a chapter in the book REST from Research to Practice. Its chapter 23 (Towards Distributed Atomic Transactions over RESTful Services) goes deep in explaining the Try/Cancel/Confirm pattern.
In general terms it implies that when you do a group of transactions, their side effects are not effective until a transaction coordinator gets a confirmation that they all were successful. For example, if you make a reservation in Expedia and your flight has two legs with different airlines, then one transaction would reserve a flight with American Airlines and another one would reserve a flight with United Airlines. If your second reservation fails, then you want to compensate the first one. But not only that, you want to avoid that the first reservation is effective until you have been able to confirm both. So, initial transaction makes the reservation but keeps its side effects pending to confirm. And the second reservation would do the same. Once the transaction coordinator knows everything is reserved, it can send a confirmation message to all parties such that they confirm their reservations. If reservations are not confirmed within a sensible time window, they are automatically reversed by the affected system.
The book Enterprise Integration Patterns has some basic ideas on how to implement this kind of event coordination (e.g. see process manager pattern and compare with routing slip pattern which are similar ideas to orchestration vs choreography in the Microservices world).
As you can see, being able to compensate transactions might be complicated depending on how complex is your distributed workflow. The process manager may need to keep track of the state of every step and know when the whole thing needs to be undone. This is pretty much that idea of Sagas in the Microservices world.
The book Microservices Patterns has an entire chapter called Managing Transactions with Sagas that delves in detail on how to implement this type of solution.
A few other aspects I also typically consider are the following:
I believe that a key to a successful implementation of your service transactions in a distributed system consists in making them idempotent. Once you can guarantee a given service is idempotent, then you can safely retry it without worrying about causing additional side effects. However, just retrying a failed transaction won't solve your problems.
Transient vs Persistent Errors
When it comes to retrying a service transaction, you shouldn't just retry because it failed. You must first know why it failed and depending on the error it might make sense to retry or not. Some types of errors are transient, for example, if one transaction fails due to a query timeout, that's probably fine to retry and most likely it will succeed the second time; but if you get a database constraint violation error (e.g. because a DBA added a check constraint to a field), then there is no point in retrying that transaction: no matter how many times you try it will fail.
Embrace Error as an Alternative Flow
As mentioned at the beginning of my answer, not everything is an error. Some things are just alternative flows.
In those cases of interservice communication (computer-to-computer interactions) , when a given step of your workflow fails, you don't necessarily need to undo everything you did in previous steps. You can just embrace error as part of you workflow. Catalog the possible causes of error and make them an alternative flow of events that simply requires human intervention. It is just another step in the full orchestration that requires a person to intervene to make a decision, resolve an inconsistency with the data or just approve which way to go.
For example, maybe when you're processing an order, the payment service fails because you don't have enough funds. So, there is no point in undoing everything else. All we need is to put the order in a state that some problem solver can address it in the system and, once fixed, you can continue with the rest of the workflow.
Transaction and Data Model State are Key
I have discovered that this type of transactional workflows require a good design of the different states your model has to go through. As in the case of Try/Cancel/Confirm pattern, this implies initially applying the side effects without necessarily making the data model available to the users.
For example, when you place an order, maybe you add it to the database in a "Pending" status that will not appear in the UI of the warehouse systems. Once payments have been confirmed the order will then appear in the UI such that a user can finally process its shipments.
The difficulty here is discovering how to design transaction granularity in way that even if one step of your transaction workflow fails, the system remains in a valid state from which you can resume once the cause of the failure is corrected.
Designing for Distributed Transactional Workflows
So, as you can see, designing a distributed system that works in this way is a bit more complicated than individually invoking distributed transactional services. Now every service invocation may fail for a number of reasons and leave your distributed workflow in a inconsistent state. And retrying the transaction may not always solve the problem. And your data needs to be modeled like a state machine, such that side effects are applied but not confirmed until the entire orchestration is successful.
That‘s why the whole thing may need to be designed in a different way than you would typically do in a monolithic client–server application. Your users may now be part of the designed solution when it comes to solving conflicts, and contemplate that transactional orchestrations could potentially take hours or even days to complete depending on how their conflicts are resolved.
As I was originally saying, the topic is way too broad and it would require a more specific question to discuss, perhaps, just one or two of these aspects in detail.
At any rate, I hope this somehow helped you with your investigation.
* Enterprise Integration Patterns (basically an entire book about messaging architectures) 
* Vaughn Vernon's books and online writing ,
* Domain Driven Design by Eric Evans ,
* and most of what Greg Young, Udi Dahan, and that constellation of folks has done online (lots of talks and blog articles.)
Depending on your platform of choice, there may be others worth reading. For my 2¢, the dragons are mostly in the design phase, not the implementation phase. The mechanics of ES are pretty straightforward—there are a few things to look out for, like detection of dropped messages, but they're primarily the risks you see with any distributed system, and you have a collection of tradeoffs to weigh against each other.
In design, however, your boundaries become very important, because you have to live with them for a long time and evolving them takes planning. If you create highly coupled bounded contexts, you're in for a lot of pain over the years you maintain a system. However, if you do a pretty good job with them, there's a lot of benefits completely aside from ES.
You can definitely just use pure RabbitMQ. You just have to keep a couple things in mind.
Warning: This answer will be a bit extremely tongue-in-cheek.
First you should read Enterprise Integration Patterns cover to cover and make sure you understand it well. It is 736 pages, and a bit dry, but extremely useful information. It also wouldn't hurt to become an expert in all the peculiarities of RabbitMQ.
Then you just have to decide how you'll define messages, how to define message handlers, how to send messages and publish events. Before you get too far you'll want a good logging infrastructure. You'll need to create a message serializer and infrastructure for message routing. You'll need to include a bunch of infrastructure-related metadata with the content of each business message. You'll want to build a message dequeuing strategy that performs well and uses broker connections efficiently, keeping concurrency needs in mind.
Next you'll need to figure out how to retry messages automatically when the handling logic fails, but not too many times. You have to have a strategy for dealing with poison messages, so you'll need to move them aside so your handling logic doesn't get jammed preventing valid messages from being processed. You'll need a way to show those messages that have failed and figure out why, so you can fix the problem. You'll want some sort of alerting options so you know when that happens. It would be nice if that poison message display also showed you where that message came from and what the exception was so you don't need to go digging through log files. After that you'll need to be able to reroute the poison messages back into the queue to try again. In the event of a bad deployment you might have a lot of failed messages, so it would be really nice if you didn't have to retry the messages one at a time.
Since you're using RabbitMQ, there are no transactions on the message broker, so ghost messages and duplicate entities are very real problems. You'll need to code all message handling logic with idempotency in mind or your RabbitMQ messages and database entities will begin to get inconsistent. Alternatively you could design infrastructure to mimic distributed transactions by storing outgoing messaging operations in your business database and then executing the message dispatch operations separately. That results in duplicate messages (by design) so you'll need to deduplicate messages as they come in, which means you need well a well-defined strategy for consistent message IDs across your system. Be careful, as anything dealing with transactions and concurrency can be extremely tricky.
You'll probably want to do some workflow type stuff, where an incoming message starts a process that's essentially a message-driven state machine. Then you can do things like trigger an action once 2 required messages have been received. You'll need to design a storage system for that data. You'll probably also need a way to have delayed messages, so you can do things like the buyer's remorse pattern. RabbitMQ has no way to have an arbitrary delay on a message, so you'll have to come up with a way to implement that.
You'll probably want some metrics and performance counters on this system to know how it's performing. You'll want some way to be able to have tests on your message handling logic, so if you need to swap out some dependencies to make that work you might want to integrate a dependency injection framework.
Because these systems are decentralized by nature it can get pretty difficult to accurately picture what your system looks like. If you send a copy of every message to a central location, you can write some code to stitch together all the message conversations, and then you can use that data to build message flow diagrams, sequence diagrams, etc. This kind of living documentation based on live data can be critical for explaining things to managers or figuring out why a process isn't working as expected.
Speaking of documentation, make sure you write a whole lot of it for your message queue wrapper, otherwise it will be pretty difficult for other developers to help you maintain it. Of if someone else on your team is writing it, you'll be totally screwed when they get a different job and leave the company. You're also going to want a ton of unit tests on the RabbitMQ wrapper you've built. Infrastructure code like this should be rock-solid. You don't want losing a message to result in lost sales or anything like that.
So if you keep those few things in mind, you can totally use pure RabbitMQ without NServiceBus.
Hopefully, when you're done, your boss won't decide that you need to switch from RabbitMQ to Azure Service Bus or Amazon SQS.
Actually "Message Router" is one of the "Basic Messaging Concepts". List of such basic messaging concepts is:
"Content Based Router" is one of the "Message Routers" and there are a lot of different other Message Routers available like "Message Filter", "Splitter", "Aggregator", "Recipient list" etc.
I suggest reading a book that used by camel so all such points will be more clear:
A couple of recommended books:
Adding to the great ones mentioned above:
Patterns of Enterprise Application Architecture
Enterprise Integration Patterns
Unfortunately I have no experience with what you are going to implement, and I didn't take this book, but when I read your question I just remembered its title and idea that it can be useful pop up: "Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems" (check also book advice from amazon customers under that book)
Here is google book link, so you can check its content
Btw, you will have client-server systems, maybe another integration needed, so there is another interesting book in that area which I read: Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions and alternative, similar catalog of SOA patterns http://www.soapatterns.org/ - just in case you will need integration and its security.