Writing Solid Code (Microsoft Programming Series)
All
Stack Overflow 13
This Year
Stack Overflow 3
This Month
Stack Overflow 1
No, the code should not (necessarily) give a segfault. A segfault occurs when you attempt to access a virtual memory page that is not allocated to your process.
The "heap" or "free store" is a region of virtual memory pages owned by your process. The
malloc()
API sub-divides this region into blocks and returns a pointer to the block.If you address beyond the end of the block to which you have a pointer, you will usually access memory that is part of the heap, but not part of your allocated block. In this way, you can corrupt other heap blocks or even the data structures which
malloc()
uses to define the heap.For more information on heap corruption, and methods to detect it in the debug version of your code, this is a great book:
Writing Solid Code: Microsoft's Techniques for Developing Bug-Free C Programs by Steve Maguire
An addendum for the pedantic: In rare cases, by accessing memory beyond the end of a heap block, you may access memory that is not part of the heap. In these cases, you may get the segmentation fault you expected. You might also corrupt some other data structure than the heap. It's really a matter of chance. However, the heap itself is very large compared to typical heap blocks, so 99% of the time code such as your example will corrupt the heap. The example you provide falls into that 99% case.
Application Verifier combined with Debugging Tools for Windows is an amazing setup. You can get both as a part of the Windows Driver Kit or the lighter Windows SDK. (Found out about Application Verifier when researching an earlier question about a heap corruption issue.) I've used BoundsChecker and Insure++ (mentioned in other answers) in the past too, although I was surprised how much functionality was in Application Verifier.
Electric Fence (aka "efence"), dmalloc, valgrind, and so forth are all worth mentioning, but most of these are much easier to get running under *nix than Windows. Valgrind is ridiculously flexible: I've debugged large server software with many heap issues using it.
When all else fails, you can provide your own global operator new/delete and malloc/calloc/realloc overloads -- how to do so will vary a bit depending on compiler and platform -- and this will be a bit of an investment -- but it may pay off over the long run. The desirable feature list should look familiar from dmalloc and electricfence, and the surprisingly excellent book Writing Solid Code:
Note that in our local homebrew system (for an embedded target) we keep the tracking separate from most of the other stuff, because the run-time overhead is much higher.
If you're interested in more reasons to overload these allocation functions/operators, take a look at my answer to "Any reason to overload global operator new and delete?"; shameless self-promotion aside, it lists other techniques that are helpful in tracking heap corruption errors, as well as other applicable tools.
Look up the definition of POSIX
getline()
.Remember that you need to capture the return value from
realloc()
; it is not guaranteed that the new memory block starts at the same position as the old one.Know that
malloc(0)
may return a null pointer, or it may return a non-null pointer that is unusable (because it points to zero bytes of memory).You may not write '
*list = '\0';
when list points to zero bytes of allocated memory; you don't have permission to write there. If you get a NULL back, you are likely to get a core dump. In any case, you are invoking undefined behaviour, which is 'A Bad Idea™'. (Thanks)The
palabra = newChar();
inmain()
leaks memory - assuming that you fix the other problems already discussed.The code in
readLine()
doesn't consider the possibility of getting EOF before getting a newline; that is bad and will result in a core dump when memory allocation (finally) fails.Your code will exhibit poor performance because it allocates one character at a time. Typically, you should allocate considerably more than one extra character at a time; starting with an initial allocation of perhaps 4 bytes and doubling the allocation each time you need more space might be better. Keep the initial allocation small so that the reallocation code is properly tested.
The return value from
getchar()
is anint
, not achar
. On most machines, it can return 256 different positive character values (even ifchar
is a signed type) and a separate value, EOF, that is distinct from all thechar
values. (The standard allows it to return more than 256 different characters if the machine has bytes that are bigger than 8 bits each.) (Thanks) The C99 standard §7.19.7.1 says offgetc()
:(Emphasis added.) It defines
getchar()
in terms ofgetc()
, and it definesgetc()
in terms offgetc()
.(Borrowed: Thanks). The first argument to
realloc()
is the pointer to the start of the currently allocated memory, not a pointer to the pointer to the start of the currently allocated memory. If you didn't get a compilation warning from it, you are not compiling with enough warnings set on your compiler. You should turn up the warnings to the maximum. You should heed the compiler warnings - they are normally indicative of bugs in your code, especially while you are still learning the language.It is often easier to keep the string without a null terminator until you know you have reached the end of the line (or end of input). When there are no more characters to be read (for the time being), then append the null so that the string is properly terminated before it is returned. These functions do not need the string properly terminate while they are reading, as long as you keep track of where you are in the string. Do make sure you have enough room at all times to add the NUL
'\0'
to the end of the string, though.See Kernighan & Pike 'The Practice of Programming' for a lot of relevant discussions. I also think Maguire 'Writing Solid Code' has relevant advice to offer, for all it is somewhat dated. However, you should be aware that there are those who excoriate the book. Consequently, I recommend TPOP over WSC (but Amazon has WSC available from $0.01 + p&p, whereas TPOP starts at $20.00 + p&p -- this may be the market speaking).
TPOP was previously at http://plan9.bell-labs.com/cm/cs/tpop and http://cm.bell-labs.com/cm/cs/tpop but both are now (2015-08-10) broken. See also Wikipedia on TPOP.
Actually, the last of those is equivalent to a call to
free()
. Read the specification ofrealloc()
very carefully, and you will find it can allocate data anew, or change the size of an allocation (which, especially if the new size is larger than the old, might move the data around), and it can release memory too. In fact, you don't need the other functions; they can all be written in terms ofrealloc()
. Not that anyone in their right mind would do so...but it could be done.See Steve Maguire's "Writing Solid Code" for a complete dissection of the perils of the
malloc()
family of functions. See the ACCU web site for a complete dissection of the perils of reading "Writing Solid Code". I'm not convinced it is as bad as the reviews make it out to be - though its complete lack of a treatment ofconst
does date it (back to the early 90s, when C89 was still new and not widely implemented in full).D McKee's notes about MacOS X 10.5 (BSD) are interesting...
The C99 standard says:
7.20.3.3 The malloc function
7.20.3.4 The realloc function
Apart from editorial changes because of extra headers and functions, the ISO/IEC 9899:2011 standard says the same as C99, but in section 7.22.3 instead of 7.20.3.
The Solaris 10 (SPARC) man page for realloc says:
That's a pretty explicit 'it works like free()' statement.
However, that MacOS X 10.5 or BSD says anything different reaffirms the "No-one in their right mind" part of my first paragraph.
There is, of course, the C99 Rationale...It says:
7.20.3 Memory management functions
[...]
7.20.3.4 The realloc function
Thomas Padron-McCarthy observed:
Yes, they have removed that sentence because it is subsumed by the opening sentence:
There's no wriggle room there; the old object is deallocated. If the requested size is zero, then you get back whatever
malloc(0)
might return, which is often (usually) a null pointer but might be a non-null pointer that can also be returned tofree()
but which cannot legitimately be dereferenced.For a platform-specific solution, you may be interested in the Win32 function IsBadReadPtr (and others like it). This function will be able to (almost) predict whether you will get a segmentation fault when reading from a particular chunk of memory.
Note: IsBadReadPtr has been deprecated by Microsoft.
However, this does not protect you in the general case, because the operating system knows nothing of the C runtime heap manager, and if a caller passes in a buffer that isn't as large as you expect, then the rest of the heap block will continue to be readable from an OS perspective.
Pointers have no information with them other than where they point. The best you can do is say "I know how this particular compiler version allocates memory, so I'll dereference memory, move the pointer back 4 bytes, check the size, makes sure it matches..." and so on. You cannot do it in a standard fashion, since memory allocation is implementation defined. Not to mention they might have not dynamically allocated it at all.
On a side note, I recommend reading 'Writing Solid Code' by Steve McGuire. Excellent sections on memory management.
General Answer
In Writing Solid Code, Steve Macguire's advice is to prefer distinct functions (methods) for specific situations. The reason is that you can assert conditions that are relevant to the specific case, and you can more easily debug because you have more context.
An interesting example is the standard C run-time's functions for dynamic memory allocation. Most of it is redundant, as
realloc
can actually do (almost) everything you need. If you haverealloc
, you don't needmalloc
orfree
. But when you have such a general function, used for several different types of operations, it's hard to add useful assertions and it's harder to write unit tests, and it's harder to see what's happening when debugging. Macquire takes it a step farther and suggests that, not only shouldrealloc
just do _re_allocation, but it should probably be two distinct functions: one for growing a block and one for shrinking a block.While I generally agree with his logic, sometimes there are practical advantages to having one general purpose method (often when operations is highly data-driven). So I usually decide on a case by case basis, with a bias toward creating very specific methods rather than overly general purpose ones.
Specific Answer
In your case, I think you need to find a way to factor out the common code from the specifics. The
switch
is often a signal that you should be using a small class hierarchy with virtual functions.If you like the single method approach, then it probably should be just a dispatcher to the more specific methods. In other words, each of those cases in the switch statement simply call the appropriate
Method1
,Method2
, etc. If you want the user to see only the general purpose method, then you can make the specific implementations private methods.Two good source books for this sort of stuff are The Practice of Programming and Writing Solid Code. One of them (I don't remember which) says: Prefer enum to #define where you can, because enum gets checked by the compiler.