What is the proper way to achieve that kind of skill?
I don't know that there's any one specific "proper" way, and as the old saying goes "many roads lead to Rome." But I do think the spirit of this old saw applies:
A fellow goes to New York to attend a concert, but gets lost. He spots another fellow who’s carrying a violin case. “Sir, can you tell me how to get to Carnegie Hall?” The musician smiles and says, “Practice, practice, practice.”
Are there any must read topics/subjects?
I think many people approach this from different directions, but you can pretty much always figure that it helps to get "close to the hardware" and understand things from first principles as much as possible, and then build your knowledge up from that base. So if you start from literally understanding how you make a logic gate from a transistor, and then up to how AND,OR,NAND,NOR,NOT etc. gates are used to implement digital logic, and up through some basics of how a CPU executes code, yadda yadda, you're probably on a good path. Then from the code level, understanding assembly language for at least one architecture and having at least some notion of how the assembly mnemonics map to the CPU ISA and what's going on at the hardware level. What are registers and how are they used, how is data shuffled around between different parts of memory, etc. From there you can build up to understanding different parts of the computing "stack" - the operating system kernel, standard library, memory models, etc.
All of that said, I don't claim that the above is the way, just a way. I'm sure there are people who use the title "hacker" who didn't do any of that. Again - "many roads".
The other thing I'll throw out there is that math comes into play at some levels depending on exactly where your interests take you. It can't hurt to pick up some basic number theory, boolean algebra, computability theory, etc. Some of the kinds of things that come up in the book Hackers Delight[1] could be of interest.
Another thought - if you haven't read Steven Levy's book Hackers: Heroes of the Computer Revolution[2] give that a read. It's not a super technical book, being more about the "spirit" or essence of what "hacking" and "hackerdom" are. But I would say that might be as valuable as the technical stuff in many ways.
Also, reading the various pg essays[3] and/or Paul's book Hackers and Painters[4] probably can't hurt either.
Beyond that, there's a whole laundry list of books and resources one might mention as seminal or defining works of "hackerdom". Things like the TCP/IP Illustrated books by Stephens, The C Programming Language by Kernighan and Ritchie, The Cathedral and the Bazaar by esr, the Internetworking with TCP/IP books by Comer, Advanced Programming in the UNIX Environment by Stephens, a lot of early LISP material by McCarthy and others, a lot of the "AI Series" papers from MIT's CSAIL lab, etc., etc. And that's not close to a comprehensive list, if such a thing could be said to exist. Just some examples of the kinds of things people who associate with hackerdom tend to get into.
Last thought - I would cite "curiosity" as the most important defining trait for becoming a "hacker". If you're never satisfied with your current level of knowledge, always want to probe and dig deeper, and understand more and more and more and more of how things work and why things are the way they are, then in my book you're pretty much a hacker. I'd say try to cultivate that insatiable desire to learn and just dive in and don't worry too much about the "proper" road.
Understanding Software Dynamics (Addison-Wesley Professional Computing Series) 1st Edition
by Richard L. Sites (https://www.amazon.com/Understanding-Software-Addison-Wesley...)
This is a recent book and I have not completed reading it but I know already is it destined to become a classic. If you are serious about learning how to understand and improve code, this is the book for you. (Full disclosure: Dick is a long-standing friend and one of the brightest and insightful guys around.)
Writing Efficient Programs (Prentice-Hall Software Series) 1st Edition by Jon Louis Bentley (https://www.amazon.com/Writing-Efficient-Programs-Prentice-H...)
This book is out of print but I believe it's a better read that Jon's Programming Pearls books. C is a great (and permissive) programming language. Jon shows how the language can be exploited for gain performance. Evil but effective.
Hacker's Delight 2nd Edition by Henry Warren (https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...)
The Hacker's Delight is exactly what Hank Warren intended--a collection of tricks and facts that programs can exploit. It's informative on many levels. It uses deep knowledge of how numbers (and other mathematical objects) work to compute useful information. And it shows how seemingly arcane information can be useful. For a programmer excited by the fabric of programs, it is a continual delight.
The first edition has a clarity and compactness that I find appealing. The second edition has additional material. When you get through all of the Delights you can read Knuth Volume 4 which embeds more arcane and useful knowledge.
This book will tell you how to efficiently generate condition flag values such as carry and overflow. It's easy when when you have extra bits available (such as emulating an 8 or 16 bit processor on a 32-bit processor), but less obvious when you are emulating a 32-bit processor on a 32-bit processor from a high level language.
For some of the "clever tricks" at higher levels, Hacker's Delight is a great treasure trove - it's not x86-64 specific, as a lot of the tricks are applicable to many CPUs, but it helps understand some of the seemingly bizarre tricks.
It didn't "pop-up" in a genius' head. Right shifting binary numbers would divide a number by 2 and left shifting the numbers would multiply it by 2. This is because 10 is 2 in binary. Multiplying a number by 10(be it binary or decimal or hexadecimal) appends a 0 to the number(which is effectively left shifting). Similarly, dividing by 10(or 2) removes a binary digit from the number(effectively right shifting). This is how the logic really works.
There are plenty of such bit-twiddlery(a word I invented a minute ago) in computer world.
http://graphics.stanford.edu/~seander/bithacks.html Here is for the starters.
This is my favorite book: http://www.amazon.com/Hackers-Delight-Edition-Henry-Warren/dp/0321842685/ref=dp_ob_image_bk on bit-twiddlery.
If bit manipulation is something you enjoy, there is an excellent book called Hacker's Delight (2e) by Henry Warren that you will find to be delightful. https://www.amazon.com/dp/0321842685
You can just use any of the standard bit twiddling hacks to reverse bytes and then shuffle the bytes (see e.g. [Hacker's Delight](https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685) and [this list](http://graphics.stanford.edu/%7Eseander/bithacks.html#BitReverseObvious)).
The Hacker's Delight website only contains the source code from the book for the 8x8 and 32x32 cases, but the latter generalizes trivially to your 64x64 case:
#include <stdint.h>
void
transpose64(uint64_t a[64]) {
int j, k;
uint64_t m, t;
for (j = 32, m = 0x00000000FFFFFFFF; j; j >>= 1, m ^= m << j) {
for (k = 0; k < 64; k = ((k | j) + 1) & ~j) {
t = (a[k] ^ (a[k | j] >> j)) & m;
a[k] ^= t;
a[k | j] ^= (t << j);
}
}
}
The way that this works is that the function swaps successively smaller blocks of bits, starting with 32x32 blocks (without transposing the bit within those blocks), after that within those 32x32 blocks it swaps the appropriate 16x16 blocks, etc. The variable that holds the block size is j. Therefore, the outer loop has j succcessively take the values 32, 16, 8, 4, 2 and 1, which means that the outer loop runs six times. The inner loop runs over half the lines of your of bits, the lines where a given bit in the variable k is equal to zero. When j is 32 those are the lines 0-31, when j is 16 those are the lines 0-15 and 32-47, etc. Together the inner part of the loop runs 6*32 = 192 times. What happens inside this inner part is that the mask m determines what are the bits that should be swapped, in t the xor or those bits are calculated, and that xor-ed lists of bits is used to update the bits in both places appropriately.
The book (and the website) also has a version of this code in which these loops have both been unrolled, and where the mask m is not calculated, but just assigned. I guess it depends on things like the number of registers and the size of your instruction cache whether that is an improvement?
To test that this works, suppose we define some bit pattern, say:
This is actually not really what you asked for, as you asked for a non-destructive version of this code. You can get this by having the first swap of the 32x32 blocks go from x to y. For instance, you might do something like:
void
non_destructive_transpose64(uint64_t x[64], uint64_t y[64]) {
int j, k;
uint64_t m, t;
for (k = 0; k < 64; k += 2) {
((uint32_t *) y)[k] = ((uint32_t *) x)[k ^ 64 + 1];
((uint32_t *) y)[k + 1] = ((uint32_t *) x)[k + 1];
}
for (; k < 128; k += 2) {
((uint32_t *) y)[k] = ((uint32_t *) x)[k];
((uint32_t *) y)[k + 1] = ((uint32_t *) x)[k ^ 64];
}
for (j = 16, m = 0x0000FFFF0000FFFF; j; j >>= 1, m ^= m << j) {
for (k = 0; k < 64; k = ((k | j) + 1) & ~j) {
t = (y[k] ^ (y[k | j] >> j)) & m;
y[k] ^= t;
y[k | j] ^= (t << j);
}
}
}
Unlike the other version of the code this does not work regardless of endianness of the architecture. Also, I know that the C standard does not allow you to access an array of uint64_t as an array of uint32_t. However, I like it that no shifts or xors are needed for the first iteration of the move-the-blocks-around loop when you do it like this.
I don't know that there's any one specific "proper" way, and as the old saying goes "many roads lead to Rome." But I do think the spirit of this old saw applies:
A fellow goes to New York to attend a concert, but gets lost. He spots another fellow who’s carrying a violin case. “Sir, can you tell me how to get to Carnegie Hall?” The musician smiles and says, “Practice, practice, practice.”
Are there any must read topics/subjects?
I think many people approach this from different directions, but you can pretty much always figure that it helps to get "close to the hardware" and understand things from first principles as much as possible, and then build your knowledge up from that base. So if you start from literally understanding how you make a logic gate from a transistor, and then up to how AND,OR,NAND,NOR,NOT etc. gates are used to implement digital logic, and up through some basics of how a CPU executes code, yadda yadda, you're probably on a good path. Then from the code level, understanding assembly language for at least one architecture and having at least some notion of how the assembly mnemonics map to the CPU ISA and what's going on at the hardware level. What are registers and how are they used, how is data shuffled around between different parts of memory, etc. From there you can build up to understanding different parts of the computing "stack" - the operating system kernel, standard library, memory models, etc.
All of that said, I don't claim that the above is the way, just a way. I'm sure there are people who use the title "hacker" who didn't do any of that. Again - "many roads".
The other thing I'll throw out there is that math comes into play at some levels depending on exactly where your interests take you. It can't hurt to pick up some basic number theory, boolean algebra, computability theory, etc. Some of the kinds of things that come up in the book Hackers Delight[1] could be of interest.
Another thought - if you haven't read Steven Levy's book Hackers: Heroes of the Computer Revolution[2] give that a read. It's not a super technical book, being more about the "spirit" or essence of what "hacking" and "hackerdom" are. But I would say that might be as valuable as the technical stuff in many ways.
Also, reading the various pg essays[3] and/or Paul's book Hackers and Painters[4] probably can't hurt either.
Beyond that, there's a whole laundry list of books and resources one might mention as seminal or defining works of "hackerdom". Things like the TCP/IP Illustrated books by Stephens, The C Programming Language by Kernighan and Ritchie, The Cathedral and the Bazaar by esr, the Internetworking with TCP/IP books by Comer, Advanced Programming in the UNIX Environment by Stephens, a lot of early LISP material by McCarthy and others, a lot of the "AI Series" papers from MIT's CSAIL lab, etc., etc. And that's not close to a comprehensive list, if such a thing could be said to exist. Just some examples of the kinds of things people who associate with hackerdom tend to get into.
Last thought - I would cite "curiosity" as the most important defining trait for becoming a "hacker". If you're never satisfied with your current level of knowledge, always want to probe and dig deeper, and understand more and more and more and more of how things work and why things are the way they are, then in my book you're pretty much a hacker. I'd say try to cultivate that insatiable desire to learn and just dive in and don't worry too much about the "proper" road.
[1]: https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...
[2]: https://www.amazon.com/Hackers-Computer-Revolution-Steven-Le...
[3]: https://www.amazon.com/Hackers-Painters-Big-Ideas-Computer/d...
This is a recent book and I have not completed reading it but I know already is it destined to become a classic. If you are serious about learning how to understand and improve code, this is the book for you. (Full disclosure: Dick is a long-standing friend and one of the brightest and insightful guys around.)
Writing Efficient Programs (Prentice-Hall Software Series) 1st Edition by Jon Louis Bentley (https://www.amazon.com/Writing-Efficient-Programs-Prentice-H...)
This book is out of print but I believe it's a better read that Jon's Programming Pearls books. C is a great (and permissive) programming language. Jon shows how the language can be exploited for gain performance. Evil but effective.
Hacker's Delight 2nd Edition by Henry Warren (https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...)
The Hacker's Delight is exactly what Hank Warren intended--a collection of tricks and facts that programs can exploit. It's informative on many levels. It uses deep knowledge of how numbers (and other mathematical objects) work to compute useful information. And it shows how seemingly arcane information can be useful. For a programmer excited by the fabric of programs, it is a continual delight.
The first edition has a clarity and compactness that I find appealing. The second edition has additional material. When you get through all of the Delights you can read Knuth Volume 4 which embeds more arcane and useful knowledge.
https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...
This book will tell you how to efficiently generate condition flag values such as carry and overflow. It's easy when when you have extra bits available (such as emulating an 8 or 16 bit processor on a 32-bit processor), but less obvious when you are emulating a 32-bit processor on a 32-bit processor from a high level language.
https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...
It didn't "pop-up" in a genius' head. Right shifting binary numbers would divide a number by 2 and left shifting the numbers would multiply it by 2. This is because
10
is 2 in binary. Multiplying a number by10
(be it binary or decimal or hexadecimal) appends a0
to the number(which is effectively left shifting). Similarly, dividing by10
(or 2) removes a binary digit from the number(effectively right shifting). This is how the logic really works.There are plenty of such
bit-twiddlery
(a word I invented a minute ago) in computer world.http://graphics.stanford.edu/~seander/bithacks.html Here is for the starters.
This is my favorite book: http://www.amazon.com/Hackers-Delight-Edition-Henry-Warren/dp/0321842685/ref=dp_ob_image_bk on bit-twiddlery.
Author's website: https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...
This is mostly to address another answer here, but here's a faster way to get the bit count:
And the tests:
I'm sure there's a clever way to remove the repetition, but that can also be left as an exercise to the reader.
Edit: I copied and modified the code from hacker's delight.
This seems a generalization of the question Bitwise transpose of 8 bytes. That question was just about 8x8 transposition, so what you are asking is a bit different. But your question is answered just as well in section 7.3 of the book Hacker's Delight (you might be able to see the relevant pages on Google books). The code that is presented there apparently originates with Guy Steele.
The Hacker's Delight website only contains the source code from the book for the 8x8 and 32x32 cases, but the latter generalizes trivially to your 64x64 case:
The way that this works is that the function swaps successively smaller blocks of bits, starting with 32x32 blocks (without transposing the bit within those blocks), after that within those 32x32 blocks it swaps the appropriate 16x16 blocks, etc. The variable that holds the block size is
j
. Therefore, the outer loop hasj
succcessively take the values 32, 16, 8, 4, 2 and 1, which means that the outer loop runs six times. The inner loop runs over half the lines of your of bits, the lines where a given bit in the variablek
is equal to zero. Whenj
is 32 those are the lines 0-31, whenj
is 16 those are the lines 0-15 and 32-47, etc. Together the inner part of the loop runs 6*32 = 192 times. What happens inside this inner part is that the maskm
determines what are the bits that should be swapped, int
the xor or those bits are calculated, and that xor-ed lists of bits is used to update the bits in both places appropriately.The book (and the website) also has a version of this code in which these loops have both been unrolled, and where the mask
m
is not calculated, but just assigned. I guess it depends on things like the number of registers and the size of your instruction cache whether that is an improvement?To test that this works, suppose we define some bit pattern, say:
We then call the
transpose32
function and print the resulting bit pattern:And this then gives as output:
Which is nicely flipped, as we hoped for.
Edit:
This is actually not really what you asked for, as you asked for a non-destructive version of this code. You can get this by having the first swap of the 32x32 blocks go from
x
toy
. For instance, you might do something like:Unlike the other version of the code this does not work regardless of endianness of the architecture. Also, I know that the C standard does not allow you to access an array of
uint64_t
as an array ofuint32_t
. However, I like it that no shifts or xors are needed for the first iteration of the move-the-blocks-around loop when you do it like this.EDIT: .. and a good coffeemaker. :)
http://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/03...