It's hard when done the wrong way. As usual I will suggest the book that taught me the best, How to design programs. This is one of the best book to learn how to program, period. The only two book I (and a lot of people more knowledgable than me) will put above it are SICP and CTMCP.
This then this.
This, this and the corresponding book. No in the language you want but they are just the best in business.
A 2004 textbook by the OP:
Concepts, Techniques, and Models of Computer Programming
And his 6 week edX course:
Paradigms of Computer Programming – Fundamentals
Anyway, just to practice what I've learned so far I will try to answer some of your questions from the top of my head; apologies in advance for my verbosity:
First of all, let's define functional (in fact, to be strict, declarative; more on this below):
An operation (i.e. a code fragment with a clearly defined input and output) is functional if for a given input it always gives the same output, regardless of all other execution state. It behaves just like a mathematical function, hence the name.
This gives a declarative operation the following properties:
1) Independence: nothing going on in the rest of the world will ever affect it.
2) Statelessness (same as immutability): there is no observable internal state; the output is the same every single time it is invoked with the same input.
3) Determinism: the output depends exclusively on the input and is always the same for a given input.
So what is the difference between functional and declarative? Functional is just a subset of declarative = declarative - dataflow variables
These properties give a functional program the following key benefits:
1) It is easier to design, implement and test. This is because of the above properties. For instance, because the output will never vary between different invocations, each input only needs to be tested once.
2) Easier to reason about (to prove correct). Algebraic reasoning (applying referential transparency for instance: if f(a)=a^2 then all occurences of f(a) can be replaced with a^2) and logical reasoning can be applied.
To further explore the practical implications of all this lets say that, given that all functional programs consist of a hierarchy of components (clearly defined program fragments connected exclusively to other components through their inputs and outputs) to understand a functional program it suffices to understand each of its components in isolation.
Basically, despite other programming models having more mindshare (but, as far as I can tell, aren't really better known, and this includes me ;), because of the above properties functional programming is fundamentally simpler than more expressive models, like OO and other models with explicit state.
Another very important point is that it is perfectly acceptable and feasible to write functional programs in non strictly functional languages like Java of C++ (although not in C, I won't explain why, it's complicated but basically the core reason has to do with how memory management is done in C).
This is because functional programming is not restricted to functional languages (where the program will be functional by definition no matter how much you mess up).
A program is functional if it is observably functional, if it behaves in the way specified above.
This can be achieved in, say, Java, with some discipline and if you know what you are doing; the Interpreter and Visitor design patterns are exactly for this, and one of the key operations to implement higher order programming, procedural abstraction, can easily be done using objects (see the excellent MIT OCW course https://ocw.mit.edu/courses/electrical-engineering-and-compu... for more on this).
Because of its limitations, it is often impossible to write a purely functional program. This is because the real world is statefull and concurrent. For instance, it is impossible to write a purely functional client-server application. How about IO or a GUI? Nope. I don't know Haskell yet, it seems they somehow pull it off with monads, but this approach, although impressive, is certainly not natural.
Garbage collection is a good thing. It's main benefit to functional languages is that it totally avoids dangling references by design. This is key to making determinism possible. Of course, automatically managing inactive memory to avoid most leaks is nice too (but not all leaks, like, say, references to unused variables inside a data structure, or any external resources).
However, functional programs can indeed result in higher memory consumption (bytes allocated per second, as opposed to memory usage, which is the minimum amount of memory for the program to run), which can be an issue in simulators, in which case a good garbage collector is required.
Certain specialised domains, like hard real time where lives are at stake, require specialised hardware and software anyway, never mind whether the language is functional or not.
So, for me, for the reasons above, the take home lesson so far is:
Program in the functional style wherever possible, it is in fact easier to get right due to its higher simplicity, and restrict and encapsulate statefulness (and concurrency) in an abstraction wherever possible (this common technique is called impedance matching).
Each programming problem, or component, etc, involves some degree of design first, or modelling, or a description, whichever word you prefer, it is all the same. There are some decisions you must make before coding, TDD or no TDD.
What paradigm you choose should depend first on the nature of the problem, not on the language. Certain problems are more easily (same as naturally) described in a functional way, as recursive functions on data structures. That part of the program should be implemented in a functional way if your language of choice allows that .
Other programs are more easily modelled as an object graph, or as a state diagram (awesome for IO among other things), and this is the way they should be designed and implemented if possible. But even in this case, some components can be designed in a functional way, and they should be wherever possible.
There is no one superior way, no silver bullet, it all depends on the context. It is better to know multiple programming paradigms without preferring one over the other, and apply the right one to the right problem.