Writing Solid Code (Microsoft Programming Series)

Category: Programming
Author: Steve Maguire
3.8
All Stack Overflow 13
This Year Stack Overflow 3
This Month Stack Overflow 1

Comments

by anonymous   2019-01-13

No, the code should not (necessarily) give a segfault. A segfault occurs when you attempt to access a virtual memory page that is not allocated to your process.

The "heap" or "free store" is a region of virtual memory pages owned by your process. The malloc() API sub-divides this region into blocks and returns a pointer to the block.

If you address beyond the end of the block to which you have a pointer, you will usually access memory that is part of the heap, but not part of your allocated block. In this way, you can corrupt other heap blocks or even the data structures which malloc() uses to define the heap.

For more information on heap corruption, and methods to detect it in the debug version of your code, this is a great book:

Writing Solid Code: Microsoft's Techniques for Developing Bug-Free C Programs by Steve Maguire alt text

An addendum for the pedantic: In rare cases, by accessing memory beyond the end of a heap block, you may access memory that is not part of the heap. In these cases, you may get the segmentation fault you expected. You might also corrupt some other data structure than the heap. It's really a matter of chance. However, the heap itself is very large compared to typical heap blocks, so 99% of the time code such as your example will corrupt the heap. The example you provide falls into that 99% case.

by anonymous   2018-03-12
There is a book, [Writing Solid Code](https://smile.amazon.com/dp/1556155514/), that covers this sort of stuff in detail. There are those who excoriate the book; I think it is quite a useful read, though it has to be recognized as being written soon after the original standard was released (1993), so it has some slightly archaic attitudes to what we now take for granted. If you can find a copy in a library, it might be worth a read. The appendices include some memory checking allocation code not a million miles removed from what's under discussion here.
by anonymous   2017-08-20

Application Verifier combined with Debugging Tools for Windows is an amazing setup. You can get both as a part of the Windows Driver Kit or the lighter Windows SDK. (Found out about Application Verifier when researching an earlier question about a heap corruption issue.) I've used BoundsChecker and Insure++ (mentioned in other answers) in the past too, although I was surprised how much functionality was in Application Verifier.

Electric Fence (aka "efence"), dmalloc, valgrind, and so forth are all worth mentioning, but most of these are much easier to get running under *nix than Windows. Valgrind is ridiculously flexible: I've debugged large server software with many heap issues using it.

When all else fails, you can provide your own global operator new/delete and malloc/calloc/realloc overloads -- how to do so will vary a bit depending on compiler and platform -- and this will be a bit of an investment -- but it may pay off over the long run. The desirable feature list should look familiar from dmalloc and electricfence, and the surprisingly excellent book Writing Solid Code:

  • sentry values: allow a little more space before and after each alloc, respecting maximum alignment requirement; fill with magic numbers (helps catch buffer overflows and underflows, and the occasional "wild" pointer)
  • alloc fill: fill new allocations with a magic non-0 value -- Visual C++ will already do this for you in Debug builds (helps catch use of uninitialized vars)
  • free fill: fill in freed memory with a magic non-0 value, designed to trigger a segfault if it's dereferenced in most cases (helps catch dangling pointers)
  • delayed free: don't return freed memory to the heap for a while, keep it free filled but not available (helps catch more dangling pointers, catches proximate double-frees)
  • tracking: being able to record where an allocation was made can sometimes be useful

Note that in our local homebrew system (for an embedded target) we keep the tracking separate from most of the other stuff, because the run-time overhead is much higher.


If you're interested in more reasons to overload these allocation functions/operators, take a look at my answer to "Any reason to overload global operator new and delete?"; shameless self-promotion aside, it lists other techniques that are helpful in tracking heap corruption errors, as well as other applicable tools.

by anonymous   2017-08-20
  1. Look up the definition of POSIX getline().

  2. Remember that you need to capture the return value from realloc(); it is not guaranteed that the new memory block starts at the same position as the old one.

  3. Know that malloc(0) may return a null pointer, or it may return a non-null pointer that is unusable (because it points to zero bytes of memory).

  4. You may not write '*list = '\0'; when list points to zero bytes of allocated memory; you don't have permission to write there. If you get a NULL back, you are likely to get a core dump. In any case, you are invoking undefined behaviour, which is 'A Bad Idea™'. (Thanks)

  5. The palabra = newChar(); in main() leaks memory - assuming that you fix the other problems already discussed.

  6. The code in readLine() doesn't consider the possibility of getting EOF before getting a newline; that is bad and will result in a core dump when memory allocation (finally) fails.

  7. Your code will exhibit poor performance because it allocates one character at a time. Typically, you should allocate considerably more than one extra character at a time; starting with an initial allocation of perhaps 4 bytes and doubling the allocation each time you need more space might be better. Keep the initial allocation small so that the reallocation code is properly tested.

  8. The return value from getchar() is an int, not a char. On most machines, it can return 256 different positive character values (even if char is a signed type) and a separate value, EOF, that is distinct from all the char values. (The standard allows it to return more than 256 different characters if the machine has bytes that are bigger than 8 bits each.) (Thanks) The C99 standard §7.19.7.1 says of fgetc():

    If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

    (Emphasis added.) It defines getchar() in terms of getc(), and it defines getc() in terms of fgetc().

  9. (Borrowed: Thanks). The first argument to realloc() is the pointer to the start of the currently allocated memory, not a pointer to the pointer to the start of the currently allocated memory. If you didn't get a compilation warning from it, you are not compiling with enough warnings set on your compiler. You should turn up the warnings to the maximum. You should heed the compiler warnings - they are normally indicative of bugs in your code, especially while you are still learning the language.

  10. It is often easier to keep the string without a null terminator until you know you have reached the end of the line (or end of input). When there are no more characters to be read (for the time being), then append the null so that the string is properly terminated before it is returned. These functions do not need the string properly terminate while they are reading, as long as you keep track of where you are in the string. Do make sure you have enough room at all times to add the NUL '\0' to the end of the string, though.

See Kernighan & Pike 'The Practice of Programming' for a lot of relevant discussions. I also think Maguire 'Writing Solid Code' has relevant advice to offer, for all it is somewhat dated. However, you should be aware that there are those who excoriate the book. Consequently, I recommend TPOP over WSC (but Amazon has WSC available from $0.01 + p&p, whereas TPOP starts at $20.00 + p&p -- this may be the market speaking).


TPOP was previously at http://plan9.bell-labs.com/cm/cs/tpop and http://cm.bell-labs.com/cm/cs/tpop but both are now (2015-08-10) broken. See also Wikipedia on TPOP.

by anonymous   2017-08-20

Actually, the last of those is equivalent to a call to free(). Read the specification of realloc() very carefully, and you will find it can allocate data anew, or change the size of an allocation (which, especially if the new size is larger than the old, might move the data around), and it can release memory too. In fact, you don't need the other functions; they can all be written in terms of realloc(). Not that anyone in their right mind would do so...but it could be done.

See Steve Maguire's "Writing Solid Code" for a complete dissection of the perils of the malloc() family of functions. See the ACCU web site for a complete dissection of the perils of reading "Writing Solid Code". I'm not convinced it is as bad as the reviews make it out to be - though its complete lack of a treatment of const does date it (back to the early 90s, when C89 was still new and not widely implemented in full).


D McKee's notes about MacOS X 10.5 (BSD) are interesting...

The C99 standard says:

7.20.3.3 The malloc function

Synopsis

#include <stdlib.h>
void *malloc(size_t size);

Description

The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

Returns

The malloc function returns either a null pointer or a pointer to the allocated space.

7.20.3.4 The realloc function

Synopsis

#include <stdlib.h>
void *realloc(void *ptr, size_t size);

Description

The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.

If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size. Otherwise, if ptr does not match a pointer earlier returned by the calloc, malloc, or realloc function, or if the space has been deallocated by a call to the free or realloc function, the behavior is undefined. If memory for the new object cannot be allocated, the old object is not deallocated and its value is unchanged.

Returns

The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.


Apart from editorial changes because of extra headers and functions, the ISO/IEC 9899:2011 standard says the same as C99, but in section 7.22.3 instead of 7.20.3.


The Solaris 10 (SPARC) man page for realloc says:

The realloc() function changes the size of the block pointer to by ptr to size bytes and returns a pointer to the (possibly moved) block. The contents will be unchanged up to the lesser of the new and old sizes. If the new size of the block requires movement of the block, the space for the previous instantiation of the block is freed. If the new size is larger, the contents of the newly allocated portion of the block are unspecified. If ptr is NULL, realloc() behaves like malloc() for the specified size. If size is 0 and ptr is not a null pointer, the space pointed to is freed.

That's a pretty explicit 'it works like free()' statement.

However, that MacOS X 10.5 or BSD says anything different reaffirms the "No-one in their right mind" part of my first paragraph.


There is, of course, the C99 Rationale...It says:

7.20.3 Memory management functions

The treatment of null pointers and zero-length allocation requests in the definition of these functions was in part guided by a desire to support this paradigm:

OBJ * p; // pointer to a variable list of OBJs
    /* initial allocation */
p = (OBJ *) calloc(0, sizeof(OBJ));
     /* ... */
     /* reallocations until size settles */
 while(1) {
    p = (OBJ *) realloc((void *)p, c * sizeof(OBJ));
         /* change value of c or break out of loop */
 }

This coding style, not necessarily endorsed by the Committee, is reported to be in widespread use.

Some implementations have returned non-null values for allocation requests of zero bytes. Although this strategy has the theoretical advantage of distinguishing between “nothing” and “zero” (an unallocated pointer vs. a pointer to zero-length space), it has the more compelling theoretical disadvantage of requiring the concept of a zero-length object. Since such objects cannot be declared, the only way they could come into existence would be through such allocation requests.

The C89 Committee decided not to accept the idea of zero-length objects. The allocation functions may therefore return a null pointer for an allocation request of zero bytes. Note that this treatment does not preclude the paradigm outlined above.

QUIET CHANGE IN C89

A program which relies on size-zero allocation requests returning a non-null pointer will behave differently.

[...]

7.20.3.4 The realloc function

A null first argument is permissible. If the first argument is not null, and the second argument is 0, then the call frees the memory pointed to by the first argument, and a null argument may be returned; C99 is consistent with the policy of not allowing zero-sized objects.

A new feature of C99: the realloc function was changed to make it clear that the pointed-to object is deallocated, a new object is allocated, and the content of the new object is the same as that of the old object up to the lesser of the two sizes. C89 attempted to specify that the new object was the same object as the old object but might have a different address. This conflicts with other parts of the Standard that assume that the address of an object is constant during its lifetime. Also, implementations that support an actual allocation when the size is zero do not necessarily return a null pointer for this case. C89 appeared to require a null return value, and the Committee felt that this was too restrictive.


Thomas Padron-McCarthy observed:

C89 explicitly says: "If size is zero and ptr is not a null pointer, the object it points to is freed." So they seem to have removed that sentence in C99?

Yes, they have removed that sentence because it is subsumed by the opening sentence:

The realloc function deallocates the old object pointed to by ptr

There's no wriggle room there; the old object is deallocated. If the requested size is zero, then you get back whatever malloc(0) might return, which is often (usually) a null pointer but might be a non-null pointer that can also be returned to free() but which cannot legitimately be dereferenced.

by user195488   2017-08-20

For a platform-specific solution, you may be interested in the Win32 function IsBadReadPtr (and others like it). This function will be able to (almost) predict whether you will get a segmentation fault when reading from a particular chunk of memory.

Note: IsBadReadPtr has been deprecated by Microsoft.

However, this does not protect you in the general case, because the operating system knows nothing of the C runtime heap manager, and if a caller passes in a buffer that isn't as large as you expect, then the rest of the heap block will continue to be readable from an OS perspective.

Pointers have no information with them other than where they point. The best you can do is say "I know how this particular compiler version allocates memory, so I'll dereference memory, move the pointer back 4 bytes, check the size, makes sure it matches..." and so on. You cannot do it in a standard fashion, since memory allocation is implementation defined. Not to mention they might have not dynamically allocated it at all.

On a side note, I recommend reading 'Writing Solid Code' by Steve McGuire. Excellent sections on memory management.

by anonymous   2017-08-20

General Answer

In Writing Solid Code, Steve Macguire's advice is to prefer distinct functions (methods) for specific situations. The reason is that you can assert conditions that are relevant to the specific case, and you can more easily debug because you have more context.

An interesting example is the standard C run-time's functions for dynamic memory allocation. Most of it is redundant, as realloc can actually do (almost) everything you need. If you have realloc, you don't need malloc or free. But when you have such a general function, used for several different types of operations, it's hard to add useful assertions and it's harder to write unit tests, and it's harder to see what's happening when debugging. Macquire takes it a step farther and suggests that, not only should realloc just do _re_allocation, but it should probably be two distinct functions: one for growing a block and one for shrinking a block.

While I generally agree with his logic, sometimes there are practical advantages to having one general purpose method (often when operations is highly data-driven). So I usually decide on a case by case basis, with a bias toward creating very specific methods rather than overly general purpose ones.

Specific Answer

In your case, I think you need to find a way to factor out the common code from the specifics. The switch is often a signal that you should be using a small class hierarchy with virtual functions.

If you like the single method approach, then it probably should be just a dispatcher to the more specific methods. In other words, each of those cases in the switch statement simply call the appropriate Method1, Method2, etc. If you want the user to see only the general purpose method, then you can make the specific implementations private methods.

by Yuval F   2017-08-20

Two good source books for this sort of stuff are The Practice of Programming and Writing Solid Code. One of them (I don't remember which) says: Prefer enum to #define where you can, because enum gets checked by the compiler.