Release It!: Design and Deploy Production-Ready Software (Pragmatic Programmers)
If the monolith is truly modularized, then a failed module should not bring the whole system down. This means that every module should have its own database, for example.
Another requirement for good modularization is that a module should not make a synchronous call to another module. If one module needs data from other modules, they should do this in the background, outside a user's request.
The aggregation and orchestration of multi-module requests should be done in the Application layer. For example, if a query needs data from modules A and B, the Application sends a sub-query to A, then a sub-query to B and then combines the result and return response to client. During this request, A may not query B or vice versa. In case of partial failure, the Application may return a partial response or an error.
Also, you should have a monitoring solution for each module. This is required especially because modules have background tasks that may fail and you need to know when and how they fail.
I recommend the book Release it for this matter.
P.S. you don't need to go microservices just for this, a good designed monolith is better.
In high-level stuff, exceptions; in low-level stuff, error codes.
The default behaviour of an exception is to unwind the stack and stop the program, if I'm writing a script an and I go for a key that's not in a dictionary it's probably an error, and I want the program to halt and let me know all about that.
If, however, I'm writing a piece of code which I must know the behaviour of in every possible situation, then I want error codes. Otherwise I have to know every exception that can be thrown by every line in my function to know what it will do (Read The Exception That Grounded an Airline to get an idea of how tricky this is). It's tedious and hard to write code that reacts appropriately to every situation (including the unhappy ones), but that's because writing error-free code is tedious and hard, not because you're passing error codes.
Both Raymond Chen and Joel have made some eloquent arguments against using exceptions for everything.
I like a couple of Pragmatic Programmers' books on this subject, Ship it! and Release it!. Together, they teach a lot of real-world, pragmatic stuff about such things as build systems and how to design well-deployable programs.
My "step-by-step" process would be this:
Book tip: Release It