Reversing: Secrets of Reverse Engineering
All
Stack Overflow 15
This Year
Stack Overflow 2
This Month
Stack Overflow 7
"This workshop is a 1-2 hour introduction to what reverse engineering is. It assumes no knowledge of assembly and is done on paper worksheets rather than a computer setup for accessibility and to make the most efficient use of time."
It's by Maddie Stone, who's a Security Researcher at Google Project Zero.
She also has Android app reverse engineering
https://www.amazon.com/Reversing-Secrets-Engineering-Eldad-E...
You're getting into "Anti-reversing techniques". And it's an art basically. Worse is that even if you stomp newbies, there are "anti-anti reversing plugins" for olly and IDA Pro that they can download and bypass much of your countermeasures.
Counter measures include debugger detection by trap Debugger APIs, or detecting 'single stepping'. You can insert code that after detecting a debugger breakin, continues to function, but starts acting up at random times much later in the program. It's really a cat and mouse game and the crackers have a significant upper hand.
Check out... http://www.openrce.org/reference_library/anti_reversing - Some of what's out there.
http://www.amazon.com/Reversing-Secrets-Engineering-Eldad-Eilam/dp/0764574817/ - This book has a really good anti-reversing info and steps through the techniques. Great place to start if you're getting int reversing in general.
You cant get variable names back. TReverse engineering is very difficult to do, recommend this book - http://www.amazon.co.uk/Reversing-Secrets-Engineering-Eldad-Eilam/dp/0764574817/ref=sr_1_1?ie=UTF8&qid=1353436851&sr=8-1
If it's as foreign to you as it seems, I don't think a debugger or disassembler is going to help - you need to learn assembler programming first; study the architecture of the processor (plenty of documentation downloadable from Intel). And then since most machine code is generated by compilers, you'll need to understand how compilers generate code - the simplest way to write lots of small programs and then disassemble them to see what your C/C++ is turned into.
A couple of books that'll help you understand:-
This is most likely C++, not plain C. That is exactly what Visual C++-generated virtual method calls look like. I would say that:
dword_10087418
is a pointer to an object. It is either a global or static variable, not a local one.SomeClass::Instance->func(arg)
.If you are not familiar with the C++ object layout, you should read C++ Under the Hood , Reversing: Secrets of Reverse Engineering or Inside the C++ Object Model. The IDA Pro Book also has a brief section on the topic.
If you want a brief summary, read on. Keep in mind that all of this is an implementation detail of MSVC, other compilers do it differently and they can freely change it as they please. (Also, I'm simplifying things by not mentioning virtual/multiple inheritance). When a class uses virtual functions, the compiler builds a table of all such functions in that class and puts a pointer to that table as the first hidden member of each object of that class. Then, calling the virtual function from the table is as simple as:
And that is basically what that line of code is doing:
Note that since the member function needs access to
this
, the compiler has to pass it as well. It's a detail that's not visible in C++, butthis
is treated like an additional argument for the function - hence you see two arguments but the function only takes one (MSVC'sthiscall
convention mandates thatthis
pointer is passed inecx
). IDA doesn't bother hiding that pointer because that would get confusing.More advice: Get the
ms_rtti
script for IDA and run it to find the virtual method tables in the DLL, then search for other references todword_10087418
to see what values are written to it - you should be able to determine which vtable is associated with that object and then figure out which function is being called.You can make Hex-Rays show the code in a more readable fashion if you define stub struct types for the class and the vtable in question, then tell IDA that the pointer uses that struct type.
You might also want to have a look at OllyDbg which is a 32-bit assembler level analysing debugger. It is used to analyze binary code in scenarios where you do not have a source code. It is light weight debugger. OllyDbg is a shareware so you can download & use it for free..!!
Visit OllyDbg is home page here
PS: Back in the day crackers used SoftICE from NuMega for debugging into an executable & grab a snapshot at the values of registers. SoftICE was an advanced debugger. It was definitely the favorite tool for the crackers. I don't know about the present status of the product. NuMega's site had no information about it. I may have overlooked it but I could not find it. I recommend that you get your hands on a legacy version (4.0x) of SoftICE & apply the WindowsXP patch for SoftICE. Working with SoftICE is something of an "experience".
Further Read: Reversing: Secrets of Reverse Engineering by Eldad Eilam
Reference (All Levels)
The C Programming Language (2nd Edition) - Brian W. Kernighan and Dennis M. Ritchie (1988). Still a good, short but complete introduction to C, written by the the inventor of C. However, the language has changed and good C style has developed in the last 25 years, and there are parts of the book that show its age.
C: A Reference Manual (5th Edition) - Samuel P. Harbison and Guy R. Steele (2002). An excellent reference book on C, up to and including C99. It is not a tutorial, and probably unfit for beginners. It's great if you need to write a compiler for C, as the authors had to do when they started.
C Pocket Reference (O'Reilly) - Peter Prinz and Ulla Kirch-Prinz (2002).
The comp.lang.c FAQ - Steve Summit. Web site with answers to many questions about C.
Various versions of the C language standards can be found here.
The new C standard - an annotated reference (Free PDF) - Derek M. Jones (2009). The "new standard" referred to is the old C99 standard rather than C11.
Rationale for C99 Standard.
Beginner
Programming in C (4th Edition) - Stephen Kochan (2014). A good general introduction and tutorial.
C Primer Plus (5th Edition) - Stephen Prata (2004)
C Programming: A Modern Approach (2nd Edition) - K. N. King (2008). A good book for learning C.
A Book on C - Al Kelley/Ira Pohl (1998).
The C Book (Free Online) - Mike Banahan, Declan Brady, and Mark Doran (1991).
C: How to Program (8th Edition) - Paul Deitel and Harvey M. Deitel (2015). Lots of good tips and best practices for beginners. The index is very good and serves as a decent reference (just not fully comprehensive, and very shallow).
Head First C - David Griffiths and Dawn Griffiths (2012).
Beginning C (5th Edition) - Ivor Horton (2013). Very good explanation of pointers, using lots of small but complete programs.
Sams Teach Yourself C in 21 Days - Bradley L. Jones and Peter Aitken (2002). Very good introductory stuff.
Applications Programming in ANSI C - Richard Johnsonbaugh and Martin Kalin (1996).
Intermediate
Object-oriented Programming with ANSI-C (Free PDF) - Axel-Tobias Schreiner (1993). The code gets a bit convoluted. If you want C++, use C++.
C Interfaces and Implementations - David R. Hanson (1997). Provides information on how to define a boundary between an interface and implementation in C in a generic and reusable fashion. It also demonstrates this principle by applying it to the implementation of common mechanisms and data structures in C, such as lists, sets, exceptions, string manipulation, memory allocators, and more. Basically, Hanson took all the code he'd written as part of building Icon and lcc and pulled out the best bits in a form that other people could reuse for their own projects. It's a model of good C programming using modern design techniques (including Liskov's data abstraction), showing how to organize a big C project as a bunch of useful libraries.
The C Puzzle Book - Alan R. Feuer (1998)
The Standard C Library - P.J. Plauger (1992). It contains the complete source code to an implementation of the C89 standard library, along with extensive discussion about the design and why the code is designed as shown.
21st Century C: C Tips from the New School - Ben Klemens (2012). In addition to the C language, the book explains gdb, valgrind, autotools, and git. The comments on style are found in the last part (Chapter 6 and beyond).
Algorithms in C - Robert Sedgewick (1997). Gives you a real grasp of implementing algorithms in C. Very lucid and clear; will probably make you want to throw away all of your other algorithms books and keep this one.
Pointers on C - Kenneth Reek (1997).
Pointers in C - Naveen Toppo and Hrishikesh Dewan (2013).
Problem Solving and Program Design in C (6th Edition) - Jeri R. Hanly and Elliot B. Koffman (2009).
Data Structures - An Advanced Approach Using C - Jeffrey Esakov and Tom Weiss (1989).
C Unleashed - Richard Heathfield, Lawrence Kirby, et al. (2000). Not ideal, but it is worth intermediate programmers practicing problems written in this book. This is a good cookbook-like approach suggested by comp.lang.c contributors.
Expert
Expert C Programming: Deep C Secrets - Peter van der Linden (1994). Lots of interesting information and war stories from the Sun compiler team, but a little dated in places.
Advanced C Programming by Example - John W. Perry (1998).
Advanced Programming in the UNIX Environment - Richard W. Stevens and Stephen A. Rago (2013). Comprehensive description of how to use the Unix APIs from C code, but not so much about the mechanics of C coding.
Advanced C: Food for the Educated Palate - Narain Gehani (1985). Great on pointers, pointers to functions, and a variety of advanced topics, such as how stuff is stored in memory, dynamic memory, stack usage, function calling, parameter passing, etc. Assumes you have a good grasp of C to start with. Warning: pre-dates the ANSI standard and a lot of modern programming design.
Computer Programming: An Introduction for the Scientifically Inclined - Sander Stoks (2008). Great book about scientific use of programming languages.
Reversing: Secrets of Reverse Engineering - Eldad Eilam (2005). For those who want to test the limits of their ethics.
Uncategorized
Essential C (Free PDF) - Nick Parlante (2003). Note that this describes the C90 language at several points (e.g., in discussing
//
comments and placement of variable declarations at arbitrary points in the code), so it should be treated with some caution.C Programming FAQs: Frequently Asked Questions - Steve Summit (1995).
C in a Nutshell - Peter Prinz and Tony Crawford (2005). Excellent book if you need a reference for C99.
Functional C - Pieter Hartel and Henk Muller (1997). Teaches modern practices that are invaluable for low-level programming, with concurrency and modularity in mind.
The Practice of Programming - Brian W. Kernighan and Rob Pike (1999). A very good book to accompany K&R.
C Traps and Pitfalls by A. Koenig (1989). Very good, but the C style pre-dates standard C, which makes it less recommendable these days.
Some have argued for the removal of 'Traps and Pitfalls' from this list because it has trapped some people into making mistakes; others continue to argue for its inclusion. Perhaps it should be regarded as an 'expert' book because it requires a moderately extensive knowledge of C to understand what's changed since it was published.
Computer Systems: A Programmer's Perspective (3rd Edition) - Randal E. Bryant and David R. O'Hallaron (2015). Explains the C language in a disjointed narrative style, like Pulp Fiction.
Abstraction and Specification in Program Development - Barbara Liskov and John V. Guttag (1986) (not the newer Java-based version by Liskov alone). This is an undergraduate text, with some ideas worth thinking about.
Composite/Structured Design - Glenford J. Myers (1978). This and other books from the late 1970s and early 1980s by Yourdon and Myers provide excellent insights on structured design.
Build Your Own Lisp — Daniel Holden (2014). An enjoyable way to learn C.
MISRA-C - industry standard published and maintained by the Motor Industry Software Reliability Association. Covers C89 and C99.
Although this isn't a book as such, every experienced C programmer should read and implement as much of it as possible. MISRA-C was originally intended as guidelines for safety-critical applications in particular, but it applies to any area of application where stable, bug-free C code is desired (who doesn't want fewer bugs?). MISRA-C is becoming the de facto standard in the whole embedded industry and is getting increasingly popular even in other programming branches. There are (at least) three publications of the standard, one from 1998, one from 2004, and one from 2012, where the last is the currently active, relevant one. There is also a MISRA Compliance Guidelines document from 2016, and MISRA C:2012 Amendment 1 — Additional Security Guidelines for MISRA C:2012 (published in April 2016).
Note that some of the strictures in the MISRA rules are not appropriate to every context. For example, directive 4.12 states "Dynamic memory allocation shall not be used". This may well be appropriate in the embedded systems for which the MISRA rules are designed; it is not appropriate everywhere. (Compilers, for instance, generally use dynamic memory allocation for things like symbol tables, and to do without dynamic memory allocation would be difficult, if not preposterous.)
Archived lists of ACCU-reviewed books on Beginner's C (116 titles) from 2007 and Advanced C (76 titles) from 2008. Most of these don't look to be on the main site anymore, and you can't browse that by subject anyway.
Warnings
Be wary of books written by Herbert Schildt. In particular, you should stay away from C: The Complete Reference, known in some circles as C: The Complete Nonsense.
Also be wary of the book "Let Us C" by Yashwant Kanetkar. It is a horribly outdated book that teaches Turbo C and has lot of obsolete, misleading and downright incorrect material.
Learn C The Hard Way - Zed Shaw. A critique of this book by Tim Hentenaar:
"Learn C The Hard Way" is not a book that I could recommend to someone who is both learning to program and learning C. If you're already a competent programmer in some other related language, then it represents an interesting and unusual exposition on C, though I have reservations about parts of the book. Jonathan Leffler
Outdated
Other contributors, not credited in the revision history:
Alex Lockwood, Ben Jackson, Bubbles, claws, coledot, Dana Robinson, Daniel Holden, Dervin Thunk, dwc, Erci Hou, Garen, haziz, Johan Bezem, Jonathan Leffler, Joshua Partogi, Lucas, Lundin, Matt K., mossplix, Matthieu M., midor, Nietzche-jou, Norman Ramsey, r3st0r3, ridthyself, Robert S. Barnes, Tim Ring, Tony Bai, VMAtm
(I don't know about you but I was excited with assembly)
A simple tool for experimenting with assembly is already installed in your pc.
Go to Start menu->Run, and type
debug
debug (command)
Tutorials:
If you want to understand the code you see in IDA Pro (or OllyDbg), you'll need to learn how compiled code is structured. I recommend the book Reversing: Secrets of Reverse Engineering
I experimented a couple of weeks with
debug
when I started learning assembly (15 years ago).Note that
debug
works at the base machine level, there are no high level assembly commands.And now a simple example:
Give
a
to start writing assembly code - type the below program - and finally giveg
to run it.(
INT 21
display on screen the ASCII char stored in theDL
register if theAH
register is set to2
--INT 20
terminates the program)Exactly!
Well, you can look for routines like random() that will be called during the construction of the mines table. This book helped me a lot when I was experimenting with reverse engineering. :)
In general, good places for setting break points are calls to message boxes, calls to play a sound, timers and other win32 API routines.
BTW, I am scanning minesweeper right now with OllyDbg.
Update: nemo reminded me a great tool, Cheat Engine by Eric "Dark Byte" Heijnen.
Cheat Engine (CE) is a great tool for watching and modifying other processes memory space. Beyond that basic facility, CE has more special features like viewing the disassembled memory of a process and injecting code into other processes.
(the real value of that project is that you can download the source code -Delphi- and see how those mechanisms were implemented - I did that many years ago :o)
I've discussed why I don't think Obfuscation is an effective means of protection against cracking here:
Protect .NET Code from reverse engineering
However, your question is specifically about source theft, which is an interesting topic. In Eldad Eiliams book, "Reversing: Secrets of Reverse Engineering", the author discusses source theft as one reason behind reverse engineering in the first two chapters.
Basically, what it comes down to is the only chance you have of being targeted for source theft is if you have some very specific, hard to engineer, algorithm related to your domain that gives you a leg up on your competition. This is just about the only time it would be cost-effective to attempt to reverse engineer a small portion of your application.
So, unless you have some top-secret algorithm you don't want your competition to have, you don't need to worry about source theft. The cost involved with reversing any significant amount of source-code out of your application quickly exceeds the cost of re-writing it from scratch.
Even if you do have some algorithm you don't want them to have, there isn't much you can do to stop determined and skilled individuals from getting it anyway (if the application is executing on their machine).
Some common anti-reversing measures are:
However, packers can be unpacked, and obfuscation doesn't really hinder those who want to see what you application is doing. If the program is run on the users machine then it is vulnerable.
Eventually its code must be executed as machine code and it is normally a matter of firing up debugger, setting a few breakpoints and monitoring the instructions being executed during the relevant action and some time spent poring over this data.
You mentioned that it took you several months to write ~20kLOC for your application. It would take almost an order of magnitude longer to reverse those equivalent 20kLOC from your application into workable source if you took the bare minimum precautions.
This is why it is only cost-effective to reverse small, industry specific algorithms from your application. Anything else and it isn't worth it.
Take the following fictionalized example: Lets say I just developed a brand new competing application for iTunes that had a ton of bells and whistles. Let say it took several 100k LOC and 2 years to develop. One key feature I have is a new way of serving up music to you based off your music-listening taste.
Apple (being the pirates they are) gets wind of this and decides they really like your music suggest feature so they decide to reverse it. They will then hone-in on only that algorithm and the reverse engineers will eventually come up with a workable algorithm that serves up the equivalent suggestions given the same data. Then they implement said algorithm in their own application, call it "Genius" and make their next 10 trillion dollars.
That is how source theft goes down.
No one would sit there and reverse all 100k LOC to steal significant chunks of your compiled application. It would simply be too costly and too time consuming. About 90% of the time they would be reversing boring, non-industry-secretive code that simply handled button presses or handled user input. Instead, they could hire developers of their own to re-write most of it from scratch for less money and simply reverse the important algorithms that are difficult to engineer and that give you an edge (ie, music suggest feature).
This is not an easy task and might require tools other than gdb. I read a couple interesting RE tutorials and even though they are not specific to OSX they still provide interesting insight and examples on deciphering function parameters:
Reversing (Undocumented) Windows API Functions
Matt's Cracking Guide
Secrets of Reverse Engineering: Appendix C - Deciphering Program Data
Reverse Engineering and Function Calling by Address
I like OllyDbg. (with a good companion :)
That instruction simply pushes 32-bit constant (0x804a254) in the stack.
That instruction alone is not enough for us to tell how it is later used. Could you provide more dissasembly of the code? Especially I would like to see where this value is popped out, and how this value is later being used.
Before starting any reverse engineering I would recommend reading this book (Reverse Engineering secrets) and then X86 instruction set manual (Intel or AMD). I am assuming that you are Reverse Engineering for x86 CPU.