Programming Pearls (2nd Edition)

Category: Programming
Author: Jon Louis Bentley
4.3
All Stack Overflow 24
This Year Stack Overflow 3
This Month Stack Overflow 3

Comments

by tjr   2022-03-01
It might be a bit too archaic now, as a lot of the terms and technologies discussed were on the way out even when the last printed version was published, but I did find The New Hacker's Dictionary very enjoyable in this sense.

Programming Pearls might be another good book that promotes enjoyment of computing; certainly a bit more technical, but I think plenty readable for someone interested.

https://www.amazon.com/New-Hackers-Dictionary-3rd/dp/0262680...

https://www.amazon.com/Programming-Pearls-2nd-Jon-Bentley/dp...

by anonymous   2019-07-21

Binary search is notoriously tricky to get exactly right. There is a very thorough analysis on the various problems and edge cases, along with a correct implementation, in Programming Pearls, a book that every programmer should probably have read at least once.

by anonymous   2019-07-21

After the hashing, I got 1000 small files, each of which contains 1 million numbers at most, right

Not true, in extreme case it's possible that one file contains all the numbers.

Create the files based on the first or last x digits of the numbers (ignore the starting 1). When creating those files you can actually chop those digits because they are equal within a file. This is a lot better than hashing because although all the numbers can still end up in one file, now the range of those numbers is limited, so you can fit it into 10MB.

Each number can be represeted by a simple bit because the only information you need is whether the number occured previously. You don't have to store the actual numbers, the address of the bit is the number. In 10MB you can store 80M bits, so you will need 1G/80M = 12.5 files, but remember, those digits must differ so actually you will need 100 files (x=2).

Finally, you don't have to create those files, you can also scan the whole file multiple times. In this case you can have multiple bit-maps in memory as one doesn't occupy 10MB.

I strongly suggest reading this book, it starts with an almost identical example: http://www.amazon.co.uk/Programming-Pearls-ACM-Press-Bentley/dp/0201657880

by anonymous   2017-09-05
Repeated reallocation for a change of one in the array size is not particularly good. You'd likely do better with allocating as much space as you'll need all at once, or keep track of what's allocated and what's in use and only allocate more when necessary. Jon Bentley's [Programming Pearls, 2nd Edn](https://www.amazon.com/dp/0201657880/) from 1999 and [More Programming Pearls](https://www.amazon.com/dp/0201118890/) from 1988 include code for handling array-based heaps as you are using. You can find an outline of that code in [this answer](https://stackoverflow.com/a/21742125) of mine.
by Nigel Campbell   2017-08-20

I've wiki'd this post - could those with sufficient rep add in items to it.

System administration, general usage books

Programming:

Specific tools (e.g. Sendmail)

Various of the books from O'Reilly and other publishers cover specific topics. Some of the key ones are:

Some of these books have been in print for quite a while and are still relevant. Consequently they are also often available secondhand at much less than list price. Amazon marketplace is a good place to look for such items. It's quite a good way to do a shotgun approach to topics like this for not much money.

As an example, in New Zealand technical books are usurously expensive due to a weak kiwi peso (as the $NZ is affectionately known in expat circles) and a tortuously long supply chain. You could spend 20% of a week's after-tax pay for a starting graduate on a single book. When I was living there just out of university I used this type of market a lot, often buying books for 1/4 of their list price - including the cost of shipping to New Zealand. If you're not living in a location with tier-1 incomes I recommend this.

E-Books and on-line resources (thanks to israkir for reminding me):

  • The Linux Documentation project (www.tldp.org), has many specific topic guides known as HowTos that also often concern third party OSS tools and will be relevant to other Unix variants. It also has a series of FAQ's and guides.

  • Unix Guru's Universe is a collection of unix resources with a somewhat more old-school flavour.

  • Google. There are many, many unix and linux resources on the web. Search strings like unix commands or learn unix will turn up any amount of online resources.

  • Safari. This is a subscription service, but you can search the texts of quite a large number of books. I can recommend this as I've used it. They also do site licences for corporate customers.

Some of the philosophy of Unix:

  • The Art of UNIX Programming by E S Raymond (available online and in print).

  • The Practice of Programming by B W Kernighan and R Pike.

by anonymous   2017-08-20

Quoth cb3k

That seemed to make it work...what might the other problems be?

Here's your code with the minimal (necessary, but not sufficient) fix diagnosed by templatetypedef and a test harness.

#include <stdio.h>

static
int search(int value, int values[], int n, int high, int low)
{
    if (n <= 0)
    {
        return 1;
    }

    int middle = (high + low) / 2;

    ///condition #1
    if (value == values[middle])
    {
        return 0;
    }

    // conditions #2 and #3 (account for the maxes and mins of the array because the operation to find middle truncates)
    else if (values[middle] == values[low] || values[middle] == values[high])
    {
        return 0;
    }

    else if (value > values[middle])
    {
        low = middle;
        return search(value, values, n, high, low);
    }

    else if (value < values[middle])
    {
        high = middle;
        return search(value, values, n, high, low);
    }

    return 2;
}

int main(void)
{
    int data[15];
    for (int i = 0; i < 15; i++)
        data[i] = 2 * i + 1;

    printf("Data:");
    for (int i = 0; i < 15; i++)
        printf("%3d", data[i]);
    putchar('\n');

    for (int i = -1; i < 2 * 15 + 3; i++)
        printf("Search for %2d - result %d\n", i, search(i, data, 15, 14, 0));
    return 0;
}

Here's the output:

Data:  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29
Search for -1 - result 0
Search for  0 - result 0
Search for  1 - result 0
Search for  2 - result 0
Search for  3 - result 0
Search for  4 - result 0
Search for  5 - result 0
Search for  6 - result 0
Search for  7 - result 0
Search for  8 - result 0
Search for  9 - result 0
Search for 10 - result 0
Search for 11 - result 0
Search for 12 - result 0
Search for 13 - result 0
Search for 14 - result 0
Search for 15 - result 0
Search for 16 - result 0
Search for 17 - result 0
Search for 18 - result 0
Search for 19 - result 0
Search for 20 - result 0
Search for 21 - result 0
Search for 22 - result 0
Search for 23 - result 0
Search for 24 - result 0
Search for 25 - result 0
Search for 26 - result 0
Search for 27 - result 0
Search for 28 - result 0
Search for 29 - result 0
Search for 30 - result 0
Search for 31 - result 0
Search for 32 - result 0

It is returning 0 regardless of whether the value sought is present in the array or not. This is incorrect behaviour.

You should take time out to study Programming Pearls by Jon Bentley. It covers a lot the basics of the testing of binary searches in a variety of forms — the test harness shown is a variant on what he describes. Also take the time to read Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken. Maybe you should take reassurance that lots of other people have got binary search wrong over time. (IIRC, the first versions of binary search were published in the 1950s, but it wasn't until the early 1960s that a correct version was published — and then there's the Extra information from 2006, too.)

When I added a printf() in the block after else if (values[middle] == values[low] || values[middle] == values[high]), it printed on every search that should have failed. Note that the interface makes it hard to spot what's happening — it doesn't report where the element is found, just whether it is found. You can add the debugging and code changes necessary to deal with the residual problems. (Hint: that condition is probably not part of the solution. However, when you do remove it, the code goes into a permanent loop because you don't eliminate the value known not to be in the range from the range that you check recursively.)

This seems to work — note that return 2; is never executed (because the final else if is never false.

#include <stdio.h>

static
int search(int value, int values[], int n, int high, int low)
{
    //if (n <= 0)
    if (n <= 0 || high < low)
    {
        return 1;
    }

    int middle = (high + low) / 2;

    ///condition #1
    if (value == values[middle])
    {
        return 0;
    }

#if 0
    // conditions #2 and #3 (account for the maxes and mins of the array because the operation to find middle truncates)
    else if (values[middle] == values[low] || values[middle] == values[high])
    {
        //printf(" (#2 || #3) ");
        return 0;
    }
#endif

    else if (value > values[middle])
    {
        //low = middle;
        low = middle + 1;
        return search(value, values, n, high, low);
    }

    else if (value < values[middle])
    {
        //high = middle;
        high = middle - 1;
        return search(value, values, n, high, low);
    }

    return 2;
}

int main(void)
{
    int data[15];
    for (int i = 0; i < 15; i++)
        data[i] = 2 * i + 1;

    printf("Data:");
    for (int i = 0; i < 15; i++)
        printf("%3d", data[i]);
    putchar('\n');

    for (int i = -1; i < 2 * 15 + 3; i++)
        printf("Search for %2d - result %d\n", i, search(i, data, 15, 14, 0));
    return 0;
}

Output:

Data:  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29
Search for -1 - result 1
Search for  0 - result 1
Search for  1 - result 0
Search for  2 - result 1
Search for  3 - result 0
Search for  4 - result 1
Search for  5 - result 0
Search for  6 - result 1
Search for  7 - result 0
Search for  8 - result 1
Search for  9 - result 0
Search for 10 - result 1
Search for 11 - result 0
Search for 12 - result 1
Search for 13 - result 0
Search for 14 - result 1
Search for 15 - result 0
Search for 16 - result 1
Search for 17 - result 0
Search for 18 - result 1
Search for 19 - result 0
Search for 20 - result 1
Search for 21 - result 0
Search for 22 - result 1
Search for 23 - result 0
Search for 24 - result 1
Search for 25 - result 0
Search for 26 - result 1
Search for 27 - result 0
Search for 28 - result 1
Search for 29 - result 0
Search for 30 - result 1
Search for 31 - result 1
Search for 32 - result 1
by anonymous   2017-08-20

I will give you general solution with some Python pseudo code. What you are trying to solve here is the classical problem from the book "Programming Pearls" by Jon Bentley.

This is solved very efficiently with just a simple bit array, hence my comment, how long is (how many digits does have) the phone number.

Let's say the phone number is at most 10 digits long, than the max phone number you can have is: 9 999 999 999 (spaces are used for better readability). Here we can use 1bit per number to identify if the number is in set or not (bit is set or not set respectively), thus we are going to use 9 999 999 999 bits to identify each number, i.e.:

  • bits[0] identifies the number 0 000 000 000
  • bits[193] identifies the number 0 000 000 193
  • having a number 659 234-4567 would be addressed by the bits[6592344567]

Doing so we'd need to pre-allocate 9 999 999 999 bits initially set to 0, which is: 9 999 999 999 / 8 / 1024 / 1024 = around 1.2 GB of memory.

I think that holding the intersection of numbers at the end will use more space than the bits representation => at most 600k ints will be stored => 64bit * 600k = around 4.6 GB (actually int is not stored that efficiently and might use much more), if these are string you'll probably end with even more memory requirements.

Parsing a phone number string from CSV file (line by line or buffered file reader), converting it to a number and than doing a constant time memory lookup will be IMO faster than dealing with strings and merging them. Unfortunately, I don't have these phone number files to test, but would be interested to hear your findings.

from bitstring import BitArray


max_number = 9999999999

found_phone_numbers = BitArray(length=max_number+1)


# replace this function with the file open function and retrieving
#  the next found phone number
def number_from_file_iteator(dummy_data):
    for number in dummy_data:
        yield number


def calculate_intersect():
    # should be open a file1 and getting the generator with numbers from it
    #  we use dummy data here
    for number in number_from_file_iteator([1, 25, 77, 224322323, 8292, 1232422]):
        found_phone_numbers[number] = True

    # open second file and check if the number is there
    for number in number_from_file_iteator([4, 24, 224322323, 1232422, max_number]):
        if found_phone_numbers[number]:
            yield number


number_intersection = set(calculate_intersect())

print number_intersection

I used BitArray from bitstring pip package and it needed around 2 secs to initialize the entire bitstring. Afterwards, scanning the file will use constant memory. At the end I used a set to store the items.

Note 1: This algorithm can be modified to just use the list. In that case a second loop as soon as but number matches must reset the bit, so that duplicates do not match.

Note 2: Storing in the set/list occurs lazy, because we use the generator in the second for loop. Runtime complexity is linear, i.e. O(N).

by Bill the Lizard   2017-08-20

I would add the following two books if you haven't read them already:

Programming Pearls by Jon Bentley

Code by Charles Petzold

Even as it stands, you're going to have a very busy, hopefully productive break. Good luck!

Response to question edit: If you're interested in learning about databases, then I recommend Database in Depth by Chris Date. I hope by "create a GUI-based database" you mean implementing a front-end application for an existing database back end. There are plenty of database solutions out there, and it will be well worth it for your future career to learn a few of them.

by anonymous   2017-08-20

Your solution is O(n^2). The optimal solution is linear. It works so that you scan the array from left to right, taking note of the best sum and the current sum:

def get_max_sum_subset(x):
    bestSoFar = 0
    bestNow = 0
    bestStartIndexSoFar = -1
    bestStopIndexSoFar = -1
    bestStartIndexNow = -1
    for i in xrange(len(x)):
        value = bestNow + x[i]
        if value > 0:
            if bestNow == 0:
                bestStartIndexNow = i
            bestNow = value
        else:
            bestNow = 0

        if bestNow > bestSoFar:
            bestSoFar = bestNow
            bestStopIndexSoFar = i
            bestStartIndexSoFar = bestStartIndexNow

    return bestSoFar, bestStartIndexSoFar, bestStopIndexSoFar

This problem was also discussed thourougly in Programming Pearls: Algorithm Design Techniques (highly recommended). There you can also find a recursive solution, which is not optimal (O(n log n)), but better than O(n^2).

by anonymous   2017-08-20

I suggest Programming Pearls, 2nd edition, by Jon Bentley. He talks a lot about algorithm design techniques and provides examples of real world problems, how they were solved, and how different algorithms affected the runtime.

Throughout the book, you learn algorithm design techniques, program verification methods to ensure your algorithms are correct, and you also learn a little bit about data structures. It's a very good book and I recommend it to anyone who wants to master algorithms. Go read the reviews in amazon: http://www.amazon.com/Programming-Pearls-2nd-Edition-Bentley/dp/0201657880

You can have a look at some of the book's contents here: http://netlib.bell-labs.com/cm/cs/pearls/

Enjoy!

by kqr2   2017-08-19
The bibliography link in the original article was unfortunately broken.

Bentley is also well known for his book "Programming Pearls." The 2nd edition is still in print.

http://www.amazon.com/Programming-Pearls-2nd-ACM-press/dp/02...

by brg   2017-08-19
There are a number of programming books that I use to prepare for technical interviews. These are

1. Programming pearls, http://www.amazon.com/Programming-Pearls-2nd-Jon-Bentley/dp/...

2. Effective C++, http://www.amazon.com/Effective-Specific-Improve-Programs-De...

3. Programming Problems, http://www.amazon.com/Programming-Problems-Primer-Technical-...

The reason for these texts is not because they are overtly insightful or well written, it is because they have a large number of problems with completely coded solutions. After working through these basics, programming interviews are much more enjoyable.

by Revisor   2017-08-19
Thank you for your answers. To add something myself, the book Programming Pearls contains exactly this - short, elegant solutions, plus some extra food for thought and exercises.

http://www.amazon.com/Programming-Pearls-2nd-Jon-Bentley/dp/...