Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions
All
Stack Overflow 24
This Year
Stack Overflow 2
This Month
Stack Overflow 1
I would consider a queueing system as an alternative to a local database. I'd expect such product to offer features, for automatically starting transmission of messages, as soon as a connection is available. But this is really just a technical issue.
A couple of things you should consider when building your architecture:
are there limits on how out of sync your application is allowed to get? E.g. when your app hasn't connected to the master for a month, is still supposed to run without a hitch? You must allow for appropriate storage capabilities on the client, and ways to escalate the issue, when a threshold is met.
You are bound to get conflicts. E.g. when you accept an order, it may sit for 4 hours on the client, and when it finally reaches the server, the product ordered might not even exist anymore. Get creative on what might get wrong and define how these cases are supposed to get resolved.
Make sure you have decent logging, especially for stuff that is crossing the connection. You should be able to easily find logging information on server and client which relates to the same business transaction.
Make sure you can test you components, without the others.
And of course in a distributed system you have to access how trustworthy a client is. I.e. how can you assert that a message coming from server or client is really coming from a client and is not forged by somebody or something else.
Clearly quantify how much bandwidth you have available, so you can make sure that all the required data actually fits in. Many small transactions might help, since they are less likely to get interrupted by a interruption of connectivity.
You also might find this book helpful
NServiceBus Saga is a variant of a Process Manager described in the Enterprise Integration Patterns book.
To understand when to use Saga, one has to need it. Let's assume you're using regular message handlers only to implement new user registration process. At some point in time, you discover that only 40% of the brand-new registrants confirm their email address and becoming active user accounts. There are two things you'd like to address.
Now how do you do that with a regular message handler? A handler would receive the initial request (first message, m1) to kick off registration by generating an email with a confirmation link and that's it. Once the handler is done, it's done for good. But your process is not finished. It's a long-running logical process that has to span 48 hours before completed. It's no longer just a single message processing, but a workflow at this point. A workflow with multiple checkpoints. Similar to a state machine. To move from one state to another, a certain condition has to be fulfilled. In case of NServiceBus, those would be messages. A message to send a reminder after 24 hours (let's call it m2) is not going to be triggered by any user action. It's a "system" message. A timed message that should be kicked off automatically. So is with the message to instruct the system to remove registrant information if validation link was not activated. The theme can be observed: need to schedule messages in the future to re-hydrate the workflow and continue from the state it was left last time.
That's what timeouts are. Those are requests to re-hydrate/continue saga/workflow from the point it was left last time at a certain point in time - minutes, hours, days, months, years.
This is what this kind of workflow would look like as a saga (oversimplified and doesn't take into consideration all the edge cases).
Since this is a state machine, the state is persisted between Saga invocations. Invocation would be caused either by a message a saga can handle (
RegisterUser
andActivationReceived
) or by timeouts that are due (NoResponseFor24Hours
andNoResponseFor48Hours
). For this specific saga, the state is defined by the following POCO:Timeouts are nothing but plain
IMessage
s that get deferred. The timeouts used in this samples would beHope this clarifies the idea of Sagas in general, what Timeouts are and how they are used. I did not go into Message Correlation, Saga Concurrency, and some other details as those can be found at the documentation site you've referred to. Which bring us to the next point.
The site has a feedback mechanism you should absolutely provide.
Hope to see you posting a blog (or a series of posts) on this topic. By doing so you'll have a positive contribution.
Full disclaimer: I work on NServiceBus
Lots of SQL users don't understand what to do where transactions are not available.
You should implement compensation logic your own or use frameworks like Windows Workflow Foundation. Compensation logic is related to Enterprise Integration Patterns. You also may use correlation ID pattern to check if big operation was done.
SQL people manage big operations in the same way when it is necessary to make long running transaction. https://www.amazon.com/Enterprise-Integration-Patterns-Designing-Deploying/dp/0321200683/ref=sr_1_1?ie=UTF8&qid=1480917322&sr=8-1&keywords=integration+patterns
The JMS server keeps track of every subscription made and makes a difference between Durable vs. non Durable subscriptions. Suppose you have clients A, B, C and a topic T.
There are administrative settings on the message server to control how the length of time a jms server will wait for a durable subscriber to come back and claim messages before sending the message to the dead letter queue or a maximum number of message on a topic that are waiting for subscriber to come back and claim. You really need to balance the never loose a message against the flow messages and running out of memory, or disk space.
Please note that the concept of persistent queue is different than the concept of durable subscriber. Persistent queues and topics protect you against crashes of the JMS server, by writing the content of the queues and messages to disk before acknowledging receipt of the messages. Durable subscriber is about what happens to a message when a message arrive while a client is not connected.
Old answer.
Think of the JMS server the way you think of a SQL database. From the point of view of the web container there should be a pool of connections to the JMS server, So what you do is to grab a connection to the JMS server and subscribe to a topic. For example.
So if one topic is being used, you must consume the message from that topic using a selective consumer, see http://www.eaipatterns.com/MessageSelector.html what the selective consumer will do is retrive message from the topic that match certain criteria. For example the message producer that published the JMS message to the topic should be including a header or a JMS property called targetUser or something like that, then the consumer can say give any messages that from the AddressChangeTopic where the custom property targetUser="Joe" see some example selectors examples Here
The key to this is to realize that you can query a queue or a topic the way that you can query a database table, but with much limited query syntax. From a conceptual view point I highly recommend the Enterprise Integration Patterns Book http://www.amazon.ca/Enterprise-Integration-Patterns-Designing-Deploying/dp/0321200683
Some of the things you describe are not errors or exceptions, but alternative flows that you should consider in your distributed architecture.
For example, that an item is out of stock is a perfectly valid alternative flow in your business process. One that possibly requires human intervention. You could move the message to a separate queue and provide some UI where a human operator can deal with the problem, solve it and cause the flow of events to continue.
A similar thing could be said of the payment problems you describe. If an order cannot successfully be settled, a human operator will need to investigate the case and solve it. For that matter, your design must contemplate that alternative flow as part of it, and make it so a human can intervene somehow when the messages end up in a queue that requires a person to review them.
Those cases should be differentiated from errors or exceptions being thrown by the program. Those cases, depending on the circumstance, might in fact require to move the message to a dead letter queue (DLQ) for an engineer to take a look at them.
This is a very broad topic and entire books could written about this.
I believe you could probably benefit from gaining more understanding of concepts like:
The idea behind compensating transactions is that every ying has its yang: if you have one transaction that can place an order, then you could undo that with a transaction that cancels that order. This latter transaction is a compensating transaction. So, if you carry out a number of successful transactions and then one of them fails, you can trace back your steps and compensate every successful transaction you did and, as a result, revert their side effects.
I particularly liked a chapter in the book REST from Research to Practice. Its chapter 23 (Towards Distributed Atomic Transactions over RESTful Services) goes deep in explaining the Try/Cancel/Confirm pattern.
In general terms it implies that when you do a group of transactions, their side effects are not effective until a transaction coordinator gets a confirmation that they all were successful. For example, if you make a reservation in Expedia and your flight has two legs with different airlines, then one transaction would reserve a flight with American Airlines and another one would reserve a flight with United Airlines. If your second reservation fails, then you want to compensate the first one. But not only that, you want to avoid that the first reservation is effective until you have been able to confirm both. So, initial transaction makes the reservation but keeps its side effects pending to confirm. And the second reservation would do the same. Once the transaction coordinator knows everything is reserved, it can send a confirmation message to all parties such that they confirm their reservations. If reservations are not confirmed within a sensible time window, they are automatically reversed by the affected system.
The book Enterprise Integration Patterns has some basic ideas on how to implement this kind of event coordination (e.g. see process manager pattern and compare with routing slip pattern which are similar ideas to orchestration vs choreography in the Microservices world).
As you can see, being able to compensate transactions might be complicated depending on how complex is your distributed workflow. The process manager may need to keep track of the state of every step and know when the whole thing needs to be undone. This is pretty much that idea of Sagas in the Microservices world.
The book Microservices Patterns has an entire chapter called Managing Transactions with Sagas that delves in detail on how to implement this type of solution.
A few other aspects I also typically consider are the following:
Idempotency
I believe that a key to a successful implementation of your service transactions in a distributed system consists in making them idempotent. Once you can guarantee a given service is idempotent, then you can safely retry it without worrying about causing additional side effects. However, just retrying a failed transaction won't solve your problems.
Transient vs Persistent Errors
When it comes to retrying a service transaction, you shouldn't just retry because it failed. You must first know why it failed and depending on the error it might make sense to retry or not. Some types of errors are transient, for example, if one transaction fails due to a query timeout, that's probably fine to retry and most likely it will succeed the second time; but if you get a database constraint violation error (e.g. because a DBA added a check constraint to a field), then there is no point in retrying that transaction: no matter how many times you try it will fail.
Embrace Error as an Alternative Flow
As mentioned at the beginning of my answer, not everything is an error. Some things are just alternative flows.
In those cases of interservice communication (computer-to-computer interactions) , when a given step of your workflow fails, you don't necessarily need to undo everything you did in previous steps. You can just embrace error as part of you workflow. Catalog the possible causes of error and make them an alternative flow of events that simply requires human intervention. It is just another step in the full orchestration that requires a person to intervene to make a decision, resolve an inconsistency with the data or just approve which way to go.
For example, maybe when you're processing an order, the payment service fails because you don't have enough funds. So, there is no point in undoing everything else. All we need is to put the order in a state that some problem solver can address it in the system and, once fixed, you can continue with the rest of the workflow.
Transaction and Data Model State are Key
I have discovered that this type of transactional workflows require a good design of the different states your model has to go through. As in the case of Try/Cancel/Confirm pattern, this implies initially applying the side effects without necessarily making the data model available to the users.
For example, when you place an order, maybe you add it to the database in a "Pending" status that will not appear in the UI of the warehouse systems. Once payments have been confirmed the order will then appear in the UI such that a user can finally process its shipments.
The difficulty here is discovering how to design transaction granularity in way that even if one step of your transaction workflow fails, the system remains in a valid state from which you can resume once the cause of the failure is corrected.
Designing for Distributed Transactional Workflows
So, as you can see, designing a distributed system that works in this way is a bit more complicated than individually invoking distributed transactional services. Now every service invocation may fail for a number of reasons and leave your distributed workflow in a inconsistent state. And retrying the transaction may not always solve the problem. And your data needs to be modeled like a state machine, such that side effects are applied but not confirmed until the entire orchestration is successful.
That‘s why the whole thing may need to be designed in a different way than you would typically do in a monolithic client–server application. Your users may now be part of the designed solution when it comes to solving conflicts, and contemplate that transactional orchestrations could potentially take hours or even days to complete depending on how their conflicts are resolved.
As I was originally saying, the topic is way too broad and it would require a more specific question to discuss, perhaps, just one or two of these aspects in detail.
At any rate, I hope this somehow helped you with your investigation.
* Enterprise Integration Patterns (basically an entire book about messaging architectures) [1] * Vaughn Vernon's books and online writing [2], * Domain Driven Design by Eric Evans [3], * and most of what Greg Young, Udi Dahan, and that constellation of folks has done online (lots of talks and blog articles.)
Depending on your platform of choice, there may be others worth reading. For my 2¢, the dragons are mostly in the design phase, not the implementation phase. The mechanics of ES are pretty straightforward—there are a few things to look out for, like detection of dropped messages, but they're primarily the risks you see with any distributed system, and you have a collection of tradeoffs to weigh against each other.
In design, however, your boundaries become very important, because you have to live with them for a long time and evolving them takes planning. If you create highly coupled bounded contexts, you're in for a lot of pain over the years you maintain a system. However, if you do a pretty good job with them, there's a lot of benefits completely aside from ES.
[1] https://www.amazon.com/Enterprise-Integration-Patterns-Desig...
[2] https://www.amazon.com/Domain-Driven-Design-Tackling-Complex...
You can definitely just use pure RabbitMQ. You just have to keep a couple things in mind.
Warning: This answer will be
a bitextremely tongue-in-cheek.First you should read Enterprise Integration Patterns cover to cover and make sure you understand it well. It is 736 pages, and a bit dry, but extremely useful information. It also wouldn't hurt to become an expert in all the peculiarities of RabbitMQ.
Then you just have to decide how you'll define messages, how to define message handlers, how to send messages and publish events. Before you get too far you'll want a good logging infrastructure. You'll need to create a message serializer and infrastructure for message routing. You'll need to include a bunch of infrastructure-related metadata with the content of each business message. You'll want to build a message dequeuing strategy that performs well and uses broker connections efficiently, keeping concurrency needs in mind.
Next you'll need to figure out how to retry messages automatically when the handling logic fails, but not too many times. You have to have a strategy for dealing with poison messages, so you'll need to move them aside so your handling logic doesn't get jammed preventing valid messages from being processed. You'll need a way to show those messages that have failed and figure out why, so you can fix the problem. You'll want some sort of alerting options so you know when that happens. It would be nice if that poison message display also showed you where that message came from and what the exception was so you don't need to go digging through log files. After that you'll need to be able to reroute the poison messages back into the queue to try again. In the event of a bad deployment you might have a lot of failed messages, so it would be really nice if you didn't have to retry the messages one at a time.
Since you're using RabbitMQ, there are no transactions on the message broker, so ghost messages and duplicate entities are very real problems. You'll need to code all message handling logic with idempotency in mind or your RabbitMQ messages and database entities will begin to get inconsistent. Alternatively you could design infrastructure to mimic distributed transactions by storing outgoing messaging operations in your business database and then executing the message dispatch operations separately. That results in duplicate messages (by design) so you'll need to deduplicate messages as they come in, which means you need well a well-defined strategy for consistent message IDs across your system. Be careful, as anything dealing with transactions and concurrency can be extremely tricky.
You'll probably want to do some workflow type stuff, where an incoming message starts a process that's essentially a message-driven state machine. Then you can do things like trigger an action once 2 required messages have been received. You'll need to design a storage system for that data. You'll probably also need a way to have delayed messages, so you can do things like the buyer's remorse pattern. RabbitMQ has no way to have an arbitrary delay on a message, so you'll have to come up with a way to implement that.
You'll probably want some metrics and performance counters on this system to know how it's performing. You'll want some way to be able to have tests on your message handling logic, so if you need to swap out some dependencies to make that work you might want to integrate a dependency injection framework.
Because these systems are decentralized by nature it can get pretty difficult to accurately picture what your system looks like. If you send a copy of every message to a central location, you can write some code to stitch together all the message conversations, and then you can use that data to build message flow diagrams, sequence diagrams, etc. This kind of living documentation based on live data can be critical for explaining things to managers or figuring out why a process isn't working as expected.
Speaking of documentation, make sure you write a whole lot of it for your message queue wrapper, otherwise it will be pretty difficult for other developers to help you maintain it. Of if someone else on your team is writing it, you'll be totally screwed when they get a different job and leave the company. You're also going to want a ton of unit tests on the RabbitMQ wrapper you've built. Infrastructure code like this should be rock-solid. You don't want losing a message to result in lost sales or anything like that.
So if you keep those few things in mind, you can totally use pure RabbitMQ without NServiceBus.
Hopefully, when you're done, your boss won't decide that you need to switch from RabbitMQ to Azure Service Bus or Amazon SQS.
Actually "Message Router" is one of the "Basic Messaging Concepts". List of such basic messaging concepts is:
"Content Based Router" is one of the "Message Routers" and there are a lot of different other Message Routers available like "Message Filter", "Splitter", "Aggregator", "Recipient list" etc.
I suggest reading a book that used by camel so all such points will be more clear: https://www.amazon.com/o/asin/0321200683/ref=nosim/enterpriseint-20
A couple of recommended books:
Adding to the great ones mentioned above:
Patterns of Enterprise Application Architecture
Enterprise Integration Patterns
Unfortunately I have no experience with what you are going to implement, and I didn't take this book, but when I read your question I just remembered its title and idea that it can be useful pop up: "Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems" (check also book advice from amazon customers under that book)
Here is google book link, so you can check its content
Btw, you will have client-server systems, maybe another integration needed, so there is another interesting book in that area which I read: Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions and alternative, similar catalog of SOA patterns http://www.soapatterns.org/ - just in case you will need integration and its security.