Hacker's Delight (2nd Edition)

Category: Programming
Author: Henry S. Warren
4.6
All Hacker News 8
This Year Stack Overflow 2
This Month Stack Overflow 1

Comments

by mindcrime   2022-08-20
What is the proper way to achieve that kind of skill?

I don't know that there's any one specific "proper" way, and as the old saying goes "many roads lead to Rome." But I do think the spirit of this old saw applies:

A fellow goes to New York to attend a concert, but gets lost. He spots another fellow who’s carrying a violin case. “Sir, can you tell me how to get to Carnegie Hall?” The musician smiles and says, “Practice, practice, practice.”

Are there any must read topics/subjects?

I think many people approach this from different directions, but you can pretty much always figure that it helps to get "close to the hardware" and understand things from first principles as much as possible, and then build your knowledge up from that base. So if you start from literally understanding how you make a logic gate from a transistor, and then up to how AND,OR,NAND,NOR,NOT etc. gates are used to implement digital logic, and up through some basics of how a CPU executes code, yadda yadda, you're probably on a good path. Then from the code level, understanding assembly language for at least one architecture and having at least some notion of how the assembly mnemonics map to the CPU ISA and what's going on at the hardware level. What are registers and how are they used, how is data shuffled around between different parts of memory, etc. From there you can build up to understanding different parts of the computing "stack" - the operating system kernel, standard library, memory models, etc.

All of that said, I don't claim that the above is the way, just a way. I'm sure there are people who use the title "hacker" who didn't do any of that. Again - "many roads".

The other thing I'll throw out there is that math comes into play at some levels depending on exactly where your interests take you. It can't hurt to pick up some basic number theory, boolean algebra, computability theory, etc. Some of the kinds of things that come up in the book Hackers Delight[1] could be of interest.

Another thought - if you haven't read Steven Levy's book Hackers: Heroes of the Computer Revolution[2] give that a read. It's not a super technical book, being more about the "spirit" or essence of what "hacking" and "hackerdom" are. But I would say that might be as valuable as the technical stuff in many ways.

Also, reading the various pg essays[3] and/or Paul's book Hackers and Painters[4] probably can't hurt either.

Beyond that, there's a whole laundry list of books and resources one might mention as seminal or defining works of "hackerdom". Things like the TCP/IP Illustrated books by Stephens, The C Programming Language by Kernighan and Ritchie, The Cathedral and the Bazaar by esr, the Internetworking with TCP/IP books by Comer, Advanced Programming in the UNIX Environment by Stephens, a lot of early LISP material by McCarthy and others, a lot of the "AI Series" papers from MIT's CSAIL lab, etc., etc. And that's not close to a comprehensive list, if such a thing could be said to exist. Just some examples of the kinds of things people who associate with hackerdom tend to get into.

Last thought - I would cite "curiosity" as the most important defining trait for becoming a "hacker". If you're never satisfied with your current level of knowledge, always want to probe and dig deeper, and understand more and more and more and more of how things work and why things are the way they are, then in my book you're pretty much a hacker. I'd say try to cultivate that insatiable desire to learn and just dive in and don't worry too much about the "proper" road.

[1]: https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...

[2]: https://www.amazon.com/Hackers-Computer-Revolution-Steven-Le...

[3]: https://www.amazon.com/Hackers-Painters-Big-Ideas-Computer/d...

by drallison   2022-01-03
Understanding Software Dynamics (Addison-Wesley Professional Computing Series) 1st Edition by Richard L. Sites (https://www.amazon.com/Understanding-Software-Addison-Wesley...)

This is a recent book and I have not completed reading it but I know already is it destined to become a classic. If you are serious about learning how to understand and improve code, this is the book for you. (Full disclosure: Dick is a long-standing friend and one of the brightest and insightful guys around.)

Writing Efficient Programs (Prentice-Hall Software Series) 1st Edition by Jon Louis Bentley (https://www.amazon.com/Writing-Efficient-Programs-Prentice-H...)

This book is out of print but I believe it's a better read that Jon's Programming Pearls books. C is a great (and permissive) programming language. Jon shows how the language can be exploited for gain performance. Evil but effective.

Hacker's Delight 2nd Edition by Henry Warren (https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...)

The Hacker's Delight is exactly what Hank Warren intended--a collection of tricks and facts that programs can exploit. It's informative on many levels. It uses deep knowledge of how numbers (and other mathematical objects) work to compute useful information. And it shows how seemingly arcane information can be useful. For a programmer excited by the fabric of programs, it is a continual delight.

The first edition has a clarity and compactness that I find appealing. The second edition has additional material. When you get through all of the Delights you can read Knuth Volume 4 which embeds more arcane and useful knowledge.

by jhallenworld   2020-08-27
Well I have an important resource for this, the book "Hacker's Delight":

https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...

This book will tell you how to efficiently generate condition flag values such as carry and overflow. It's easy when when you have extra bits available (such as emulating an 8 or 16 bit processor on a 32-bit processor), but less obvious when you are emulating a 32-bit processor on a 32-bit processor from a high level language.

by jmalicki   2020-07-26
For some of the "clever tricks" at higher levels, Hacker's Delight is a great treasure trove - it's not x86-64 specific, as a lot of the tricks are applicable to many CPUs, but it helps understand some of the seemingly bizarre tricks.

https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...

by anonymous   2019-07-21

It didn't "pop-up" in a genius' head. Right shifting binary numbers would divide a number by 2 and left shifting the numbers would multiply it by 2. This is because 10 is 2 in binary. Multiplying a number by 10(be it binary or decimal or hexadecimal) appends a 0 to the number(which is effectively left shifting). Similarly, dividing by 10(or 2) removes a binary digit from the number(effectively right shifting). This is how the logic really works.

There are plenty of such bit-twiddlery(a word I invented a minute ago) in computer world.

http://graphics.stanford.edu/~seander/bithacks.html Here is for the starters.

This is my favorite book: http://www.amazon.com/Hackers-Delight-Edition-Henry-Warren/dp/0321842685/ref=dp_ob_image_bk on bit-twiddlery.

by anonymous   2017-12-18
If bit manipulation is something you enjoy, there is an excellent book called Hacker's Delight (2e) by Henry Warren that you will find to be delightful. https://www.amazon.com/dp/0321842685
by jdblair   2017-11-22
If you enjoy these bit twiddling hacks, I recommend "Hacker's Delight," by Henry Warren.

Author's website: https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...

by anonymous   2017-09-24
You can just use any of the standard bit twiddling hacks to reverse bytes and then shuffle the bytes (see e.g. [Hacker's Delight](https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685) and [this list](http://graphics.stanford.edu/%7Eseander/bithacks.html#BitReverseObvious)).
by anonymous   2017-08-20

This is mostly to address another answer here, but here's a faster way to get the bit count:

#define BC_OffBitCount(in, out)                                \
  {                                                            \
    out = ~in;                                                 \
    switch (sizeof(in)) {                                      \
      case 4:                                                  \
        out = (out & 0x55555555) + ((out >> 1) & 0x55555555);  \
        out = (out & 0x33333333) + ((out >> 2) & 0x33333333);  \
        out = (out & 0x0F0F0F0F) + ((out >> 4) & 0x0F0F0F0F);  \
        out = (out & 0x00FF00FF) + ((out >> 8) & 0x00FF00FF);  \
        out = (out & 0x0000FFFF) + ((out >> 16) & 0x0000FFFF); \
        break;                                                 \
      case 2:                                                  \
        out = (out & 0x5555) + ((out >> 1) & 0x5555);          \
        out = (out & 0x3333) + ((out >> 2) & 0x3333);          \
        out = (out & 0x0F0F) + ((out >> 4) & 0x0F0F);          \
        out = (out & 0x00FF) + ((out >> 8) & 0x00FF);          \
        break;                                                 \
      case 1:                                                  \
        out = (out & 0x55) + ((out >> 1) & 0x55);              \
        out = (out & 0x33) + ((out >> 2) & 0x33);              \
        out = (out & 0x0F) + ((out >> 4) & 0x0F);              \
        break;                                                 \
    }                                                          \
  }

#define BC_OffCompareGreater(a, b, out)  // Left as an exercise to the reader

And the tests:

void test_uint8_t_of_0x00_should_return_8(void) {
  uint8_t value = 0;
  uint8_t result;
  BC_OffBitCount(value, result);
  TEST_ASSERT_EQUAL(8, result);
}

void test_uint8_t_of_0x55_should_return_4(void) {
  uint8_t value = 0x55;
  uint8_t result;
  BC_OffBitCount(value, result);
  TEST_ASSERT_EQUAL(4, result);
}

void test_uint16_t_of_0xAA_should_return_12(void) {
  uint16_t value = 0xAA;
  uint16_t result;
  BC_OffBitCount(value, result);
  TEST_ASSERT_EQUAL(12, result);
}

void test_uint32_t_of_0xDEADBEEF_should_return_8(void){
    uint32_t value = 0xDEADBEEF;
    uint32_t result;
    BC_OffBitCount(value, result);
    TEST_ASSERT_EQUAL(8, result);
}

I'm sure there's a clever way to remove the repetition, but that can also be left as an exercise to the reader.

Edit: I copied and modified the code from hacker's delight.

by anonymous   2017-08-20

This seems a generalization of the question Bitwise transpose of 8 bytes. That question was just about 8x8 transposition, so what you are asking is a bit different. But your question is answered just as well in section 7.3 of the book Hacker's Delight (you might be able to see the relevant pages on Google books). The code that is presented there apparently originates with Guy Steele.

The Hacker's Delight website only contains the source code from the book for the 8x8 and 32x32 cases, but the latter generalizes trivially to your 64x64 case:

#include <stdint.h>

void
transpose64(uint64_t a[64]) {
  int j, k;
  uint64_t m, t;

  for (j = 32, m = 0x00000000FFFFFFFF; j; j >>= 1, m ^= m << j) {
    for (k = 0; k < 64; k = ((k | j) + 1) & ~j) {
      t = (a[k] ^ (a[k | j] >> j)) & m;
      a[k] ^= t;
      a[k | j] ^= (t << j);
    }
  }
}

The way that this works is that the function swaps successively smaller blocks of bits, starting with 32x32 blocks (without transposing the bit within those blocks), after that within those 32x32 blocks it swaps the appropriate 16x16 blocks, etc. The variable that holds the block size is j. Therefore, the outer loop has j succcessively take the values 32, 16, 8, 4, 2 and 1, which means that the outer loop runs six times. The inner loop runs over half the lines of your of bits, the lines where a given bit in the variable k is equal to zero. When j is 32 those are the lines 0-31, when j is 16 those are the lines 0-15 and 32-47, etc. Together the inner part of the loop runs 6*32 = 192 times. What happens inside this inner part is that the mask m determines what are the bits that should be swapped, in t the xor or those bits are calculated, and that xor-ed lists of bits is used to update the bits in both places appropriately.

The book (and the website) also has a version of this code in which these loops have both been unrolled, and where the mask m is not calculated, but just assigned. I guess it depends on things like the number of registers and the size of your instruction cache whether that is an improvement?

To test that this works, suppose we define some bit pattern, say:

uint64_t logo[] = {
0b0000000000000000000000000000000000000000000100000000000000000000,
0b0000000000000000000000000000000000000000011100000000000000000000,
0b0000000000000000000000000000000000000000111110000000000000000000,
0b0000000000000000000000000000000000000001111111000000000000000000,
0b0000000000000000000000000000000000000000111111100000000000000000,
0b0000000000000000000000000000000000000000111111100000000000000000,
0b0000000000000000000000000000000000000000011111110000000000000000,
0b0000000000000000000000000000000000000000001111111000000000000000,
0b0000000000000000000000000000000000000000001111111100000000000000,
0b0000000000000000000000000000000010000000000111111100000000000000,
0b0000000000000000000000000000000011100000000011111110000000000000,
0b0000000000000000000000000000000111110000000001111111000000000000,
0b0000000000000000000000000000001111111000000001111111100000000000,
0b0000000000000000000000000000011111111100000000111111100000000000,
0b0000000000000000000000000000001111111110000000011111110000000000,
0b0000000000000000000000000000000011111111100000001111111000000000,
0b0000000000000000000000000000000001111111110000001111111100000000,
0b0000000000000000000000000000000000111111111000000111111100000000,
0b0000000000000000000000000000000000011111111100000011111110000000,
0b0000000000000000000000000000000000001111111110000001111111000000,
0b0000000000000000000000000000000000000011111111100001111111100000,
0b0000000000000000000000001100000000000001111111110000111111100000,
0b0000000000000000000000001111000000000000111111111000011111110000,
0b0000000000000000000000011111110000000000011111111100001111100000,
0b0000000000000000000000011111111100000000001111111110001111000000,
0b0000000000000000000000111111111111000000000011111111100110000000,
0b0000000000000000000000011111111111110000000001111111110000000000,
0b0000000000000000000000000111111111111100000000111111111000000000,
0b0000000000000000000000000001111111111111100000011111110000000000,
0b0000000000000000000000000000011111111111111000001111100000000000,
0b0000000000000000000000000000000111111111111110000011000000000000,
0b0000000000000000000000000000000001111111111111100000000000000000,
0b0000000000000000000000000000000000001111111111111000000000000000,
0b0000000000000000000000000000000000000011111111111100000000000000,
0b0000000000000000000111000000000000000000111111111100000000000000,
0b0000000000000000000111111110000000000000001111111000000000000000,
0b0000000000000000000111111111111100000000000011111000000000000000,
0b0000000000000000000111111111111111110000000000110000000000000000,
0b0000000000000000001111111111111111111111100000000000000000000000,
0b0000000000000000001111111111111111111111111111000000000000000000,
0b0000000000000000000000011111111111111111111111100000000000000000,
0b0000001111110000000000000001111111111111111111100000111111000000,
0b0000001111110000000000000000000011111111111111100000111111000000,
0b0000001111110000000000000000000000000111111111100000111111000000,
0b0000001111110000000000000000000000000000001111000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
};

We then call the transpose32 function and print the resulting bit pattern:

#include <stdio.h>

void
printbits(uint64_t a[64]) {
  int i, j;

  for (i = 0; i < 64; i++) {
    for (j = 63; j >= 0; j--)
      printf("%c", (a[i] >> j) & 1 ? '1' : '0');
    printf("\n");
  }
}

int
main() {
  transpose64(logo);
  printbits(logo);
  return 0;
}

And this then gives as output:

0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000011000000011111100000111111
0000000000000000000000000000000000111111000000011111100000111111
0000000000000000000000000000000000111111000000011111100000111111
0000000000000000000000000000000000111111000000011111100000111111
0000000000000000000000000100000000011111000000011111100000111111
0000000000000000000000011110000000011111100000011111100000111111
0000000000000000000001111110000000011111100000011111100000111111
0000000000000000000001111111000000011111100000011111100000111111
0000000000000000000000111111000000011111100000011111100000111111
0000000000000000000000111111100000001111110000011111100000111111
0000000000000000000000011111100000001111110000011111100000111111
0000000000000100000000011111110000001111110000011111100000111111
0000000000001110000000001111110000001111110000011111100000111111
0000000000011110000000001111111000001111110000011111100000111111
0000000001111111000000000111111000000111111000011111100000111111
0000000000111111100000000111111100000111111000011111100000111111
0000000000111111110000000011111100000111111000011111100000111111
0000000000011111111000000011111100000111111000011111100000111111
0000000000001111111100000001111110000011111000011111100000111111
0000000000000111111100000001111110000011111100011111100000111111
0000000000000011111110000000111111000011111100011111100000111111
0001000000000001111111000000111111000011111100011111100000111111
0011110000000001111111100000111111100011111100011111100000111111
0111111000000000111111110000011111100001111100011111100000111111
0111111110000000011111111000011111110001111110011111100000111111
1111111111000000001111111000001111110001111110011111100000111111
0011111111100000000111111100001111111001111110011111100000111111
0001111111111000000011111110000111111001111110011111100000111111
0000111111111100000011111111000111111100111100000000000000111111
0000001111111110000001111111100011111100000000000000000000111111
0000000111111111100000111111110011111000000000000000000000111111
0000000011111111110000011111110001100000000000000000000000111111
0000000000111111111000001111111000000000000000000000000000111111
0000000000011111111110000111111000000000000000000000000000111111
0000000000001111111111000111110000000000011111111111111111111111
0000000000000011111111100011100000000000011111111111111111111111
0000000000000001111111111001000000000000011111111111111111111111
0000000000000000111111111100000000000000011111111111111111111111
0000000000000000001111111100000000000000011111111111111111111111
0000000000000000000111111000000000000000011111111111111111111111
0000000000000000000011110000000000000000000000000000000000000000
0000000000000000000000100000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000

Which is nicely flipped, as we hoped for.

Edit:

This is actually not really what you asked for, as you asked for a non-destructive version of this code. You can get this by having the first swap of the 32x32 blocks go from x to y. For instance, you might do something like:

void
non_destructive_transpose64(uint64_t x[64], uint64_t y[64]) {
  int j, k;
  uint64_t m, t;

  for (k = 0; k < 64; k += 2) {
    ((uint32_t *) y)[k] = ((uint32_t *) x)[k ^ 64 + 1];
    ((uint32_t *) y)[k + 1] = ((uint32_t *) x)[k + 1];
  }
  for (; k < 128; k += 2) {
    ((uint32_t *) y)[k] = ((uint32_t *) x)[k];
    ((uint32_t *) y)[k + 1] = ((uint32_t *) x)[k ^ 64];
  }
  for (j = 16, m = 0x0000FFFF0000FFFF; j; j >>= 1, m ^= m << j) {
    for (k = 0; k < 64; k = ((k | j) + 1) & ~j) {
      t = (y[k] ^ (y[k | j] >> j)) & m;
      y[k] ^= t;
      y[k | j] ^= (t << j);
    }
  }
}

Unlike the other version of the code this does not work regardless of endianness of the architecture. Also, I know that the C standard does not allow you to access an array of uint64_t as an array of uint32_t. However, I like it that no shifts or xors are needed for the first iteration of the move-the-blocks-around loop when you do it like this.

by arh68   2017-08-19
Hacker's Delight - http://www.amazon.com/Hackers-Delight-Edition-Henry-Warren/d...

EDIT: .. and a good coffeemaker. :)

by microtherion   2017-08-19
This seems to have some overlap with Henry Warren's Hacker's Delight: https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...
by kyteland   2017-08-19
There is, and I can't recommend it highly enough.

http://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/03...