Compilers: Principles, Techniques, and Tools (2nd Edition)

Category: Hardware & DIY
Author: Alfred V. Aho
All Stack Overflow 42
This Year Stack Overflow 5
This Month Stack Overflow 1


by anonymous   2019-07-21

I've head Compilers: Principles, Techniques, and Tools is solid.

by lordyod   2019-07-21

What sort of stuff are you interested in?

I haven't taken CS109 but I took a very similar course at my community college. The way C++ implements object-oriented programming is different (and much more detailed) than some other languages so if you're only familiar with, say, Java, and you want to get into video game programming, or something like that, it might be worth taking it. I don't know a whole lot about the programming assignments or lectures or anything but it seems like the sort of thing someone should know if they want to get into that field.

CS104A will take you through (roughly) the first half of the dragon book, which is considered by many to be the standard textbook on compilers. You will construct a compiler for a C-like language which takes source code as input and outputs code that looks like assembly and compiles with GCC. The programming projects are significant - many students do not complete all of them because they involve a substantial amount of work. The lecture topics build on material taught in CS130, and if you haven't taken that course (I hadn't) you may find yourself on shaky ground when it comes to the material dealing with finite automata (this is a big part of the course). The class is in some ways about what make programming languages work, how computers take the code that we write and turn it into code a machine can run, and how doing that differently may create big differences in the languages used. For myself I found the class very rewarding because I am quite interested in this sort of machine-level view of computer science and engineering, and I find comparisons between different languages at this very low-level perspective to be very interesting. I imagine that many students would be bored to tears at this however. I assume you are aware of Mackey's reputation as an instructor, and I won't try to sugar coat it, it's difficult to sit through. That said, he is extremely knowledgeable on the subject and I feel like ignoring the opportunity to learn this sort of thing from him while you are here is a mistake.

If you want to dive head-first into what makes computers, you know, compute, I'd suggest you take 104a.

by nemec   2019-07-21

The good news is that .NET already performs Tokenization for you, so what you put on the command line (program.exe arg1 arg2) becomes a string array in your Main method - some of the work is already done for you.

The standard book for compilers (parsing is integral to compilers, so you will find a lot of overlap in concepts with CLI parsing) is The Dragon Book but it can be difficult for people who haven't spent time learning academic CS theory.

This series of blog posts is a bit of a more gentle introduction to parsing. It's in Java but the code is very, very similar to C# (the biggest differences are foreach loops, implements instead of : to implement an interface, and naming conventions for interfaces (Java doesn't add an I) and methods (Java camelCases method names instead of PascalCase).

A lot of parsing involves creating a kind of "state machine" that allows you to loop through Tokens (items in your args array) and decide, "what do I do next?"

by ezweave   2019-07-21

Before you make a radical change, I would consider what you find challenging or, perhaps boring about other Engineering disciplines. I can't speak to your school's CS program (at some University's it's under the college of Engineering, other's math, and some are ABET accredited and others are not), but "there be dragons" (especially if you use the classic text on Compiler Design in your Compiler class). It isn't really any easier, but it might be easier for you.

I've been working in Software (including my internship) for 20 years. I have a BS in CS, an MS in CS, and whatever credibility getting 1/3 of the way through a PhD gets me in this discussion (very little).

I disagree with the assertion that a CS minor is as useful as a major... there's just a great deal of interesting, useful topics you will not approach and, yeah, they're probably the more "mathy" ones. Most undergrads find the following courses/subjects difficult and often drop out of a CS program (sometimes to an IS degree, which still involves programming but is more applicable to most software jobs... you just don't have as deep of a background or understanding regarding the "meat" of CS) due to their rigor: * Automata * Algorithms * Principles of Computer Programming (this is where most people learn FP via LISP/Scheme or maybe even Haskell... I have been out of school for a while) * Statistics * Machine Learning * Compiler Design

If those terms are alien to you, do some digging and learn a little about them, because most ABET programs will require them. A minor won't expose you to too much of that... anecdotally, I minored in Math, Physics, and Economics, but wouldn't consider myself an expert in any. E.g. I wouldn't think I could jump into a career in any of those solely (unless it was teaching, not denigrating teachers, just being honest).

However, I've also worked with some talented developers who dropped out of school and some young kids who, while lacking depth or breadth, aren't half bad in certain tech stacks who only went to one of those "Coding Boot Camps." If you're smart enough, and a self-starter (like someone who contributes to OS projects, works their way through "The Art of Computer Programming" and the like) you don't necessarily need a piece of paper and the debt that comes with it.

Now that I've said that, Computer Science may still be your bag.

Even if you never use the knowledge you acquire in many of the courses, you will be at least aware of what's actually going on when you, say, invoke a given function. While the "Code Camp" kids are often eager learners, there's a great deal they don't know and... they're not always quick with abstraction and composition of logic (some of the most creative aspects of programming). You'll also (sorry if it comes off as cheesy) expand your mind. Pushing yourself to understand things like Automata and Discrete Mathematics, to name a few topics, will just make you smarter. Even if you end up working as a "full stack dev" at a Ruby/React/whatever shop, it will be useful and most of the devs that have some formal education, can more easily pick up new languages because they better understand the underlying concepts. In a world where Node has made JS an actual, viable language, for many projects (and the boost by FP-lite and full FP libraries like lodash, ramda, monet and such) being able to also debug and rewrite a block of C++ or Scala is a good thing.

What's great about CS is that you can do so many things with it. You can get a job as a web dev, or work more in data science. You can work on natural language processing or video games. You're basically learning "how and why" to paint, using art as an analogy for CS and programming. That's one of the biggest differences between other Engineering disciplines. You're not limited by materials or thermodynamics, you're only limited by your willingness to work and learn and your own imagination.

To be fair, each of the disciplines that have been discussed have their own hurdles. None of them are "easier" than the other. ME is not greater than CS. CE is not less than. They're all just different things. They will all have points where they "become difficult" and you have to have the fortitude to push on. I know lots of folks who jumped around from major to major only to end up with a degree in Communications from the Arts College and no career path. The only real ding in CS is that a lot of the academic knowledge is not directly applicable to a job. They don't always teach you about version control systems (like Git) or Software Engineering processes like Agile, Less, Rational, et cetera. (My program did a bit of all of that, but experience has taught me that not all do.) I would strongly encourage you to get aquatinted with the Node ecosystem and find some open source projects to contribute to, if you ever have down time in your studies. The practical knowledge you will gain will help you get a job.

by anonymous   2019-07-21

Your question is a little vague, however it sounds like you need to develop some kind of parser to read in the cfg files and translate it into some form of intermediate language or object graph, optimize it, and then output it to c++. Sounds to me like a job for a home-grown compiler.

If you aren't familar with the different phases of a compiler I would highly recommend you check out the infamous dragon book

Then again, if this is for an important project with a deadline you probably don't have a lot of time to spend in the world of compiler theory. Instead you might want to check out antlr. It is really useful for creating a lexar and parser for you based on grammar rules that you define from the syntax of the cfg files. You can use the antlr parser to translate the cfg files into an AST or some other form of object graph. At that point you are going to be responsible for manipulating, optimizing and outputting the c++ syntax to a new file.

I haven't read it yet but this is supposed to be an excellent book for novice and experienced antlr users

plus there a plenty of antlr tutorials and examples online that I've used to help learn it. Hope that helps put you in the right direction.

by anonymous   2019-07-21

When you have to access local variables from other function calls, I think you had better redesign your code.

Theoretically, you can direct to modify the i of main() in fun() if you can completely understand how the compilers deal with the activation records of function calls on the run-time stack. You can read "Compilers: Principles, Techniques, and Tools" for details( )

enter image description here

The value of j depends on the run-time stack address between the int i; in the fun() and int i = 10; in main() . In this case, when fun() is called, their relative distance on stack is just 12. That is why the j is 12. So the *(p + j) = 20; actually changed the i of the main(). If you change your code by adding int a = 14; as following, you will find the value of j changed for the activation record on the run-time stack has been changed.

#include <stdio.h>

void fun()
  int i;
  int *p=&i;
  int j;
  /* Stack Frame size is j int pointers. */

  int i=10;
  int a=14;  /* add this for example to change the relative address i in the main and i in the fun*/
  printf("\n %d \n",i);
by anonymous   2019-07-21

There's no big difference between writing an ordinary C compiler and writing a shader compiler. The standard book on writing compilers is the so called "Dragon Book":

by anonymous   2019-01-13

Damn, whenever you get to studying compilers this will be crazy to look back to.

Anyway you are probably not in the level of building a C compiler from scratch.

So think of it this way, you need to have a set of instructions and yes you will have to go through the program string and transform the string into MIPS instructions.

The most naive way to do it and it might be what your teacher expects unless it is a compiler course, would be to parse the program text line by line as it is expected to be a simple program and as you said have a lot of conditions for each type of expression that you will evaluate.

A tip: Save for / while loops as a label as soon as you read them and from there you need to check where is the endpoint of it for the jump when necessary (completed for/while conditions etc)

Now if it is a compiler project. I'd strongly recommend you read into this book:

Compilers: Principles, Techniques, and Tools

Because to build a real compiler you need to understand all the stages of a compiler and how does it work together... You would also need to know some Language Theory.

by Michael Stum   2019-01-13

Big List of Resources:


  • ¶ Link to a PDF file
  • $ Link to a printed book
by anonymous   2019-01-13

The Abstract Syntax Tree of different languages are usually not compatible. This is the case because the AST represents the code that has been written in the respective language. That means you could traverse the tree and format it to code with identical syntax again (i.e. the whitespaces will differ, the rest is the same). However they cannot be compatible, because the languages have different constructs. For example Groovy has closures which is not the case for Java. You can usually find a mapping to different concepts that will be equivalent, but that's not the point of an AST.

The reason AST transformations are a part of Groovy whereas they are not part of Java is the same closures are part of Groovy but not of Java: different design decisions. Java was designed to be simple. Easy to get into and easy to read albeit often verbose. Groovy had a different focus. The syntax is more concise and things like domain specific languages are desired.

If you are more interested in the internals of compilers I recommend the "Dragon Book". It's as far as I know the standard you read in academics (I read it when I was studying).

by anonymous   2018-03-26
[Compilers: Principles, Techniques, and Tools]( by Aho, et al.
by anonymous   2018-03-19

Yes, you can certainly use lex and yacc to build a compiler/translator for a programming or scripting language. There are GNU variants of these tools, called flex and bison. John Levine's lex & yacc was for many years the gold standard for books about these tools. That book may be out of print, but I expect that the successor book, Flex & Bison, is just as good. To dig deeper into building a compiler, start with Aho et al., Compilers: Principles, Techniques, and Tools, 2/e. (Again, my recommendation is based on the first edition of this book.)

by Bluegorilla101   2018-02-16

Should have gotten her this dragon book so she can get a headstart on writing compilers.

by correctsbadalts   2018-02-16

Was expecting this dragon book

by OhYourFuckingGod   2018-02-16

She's trying to hide her disappointment. I'm pretty sure this is the one she wanted .

by dvt   2017-10-05
I strongly suggest you build something esoteric and fun first. This should be a "bare minimum" VM or interpreter. Here's one of mine that I wrote like a 8 years ago[1]:
by anonymous   2017-09-05
If you want to show it as a text, then VZ's answer is the way to go; however, if you want to show the `result` of an arithmetic operation, then you need to write a small parser, a good book would be:
by anonymous   2017-08-20

How many contemporary algorithms are there to build a parser?

To start with one can look at the list of the common parser generators.
See: Comparison of parser generators and look under the heading Parsing algorithm.

Backtracking Bottom-up  
Backtracking LALR(1)  
Backtracking LALR(k)  
LL(1), Backtracking, Shunting yard
LL(k) + syntactic and semantic predicates  
LL, Backtracking  
Recursive descent  
Recursive descent, Backtracking  
PEG parser interpreter, Packrat  
Packrat (modified)  
Packrat + Cut + Left Recursion  
Packrat (modified), mutating interpreter  
2-phase scannerless top-down backtracking + runtime support  
Packrat (modified to support left-recursion and resolve grammar ambiguity)  
Parsing Machine  
Recursive descent + Pratt  
Packrat (modified, partial memoization)  
Hybrid recursive descent / operator precedence  
Scannerless GLR  
runtime-extensible GLR  
Scannerless, two phase  
Earley/combinators, infinitary CFGs  
Scannerless GLR  
delta chain  

Besides parser generators, there are also other algorithms/means to parse. In particular Prolog has DCG and most people who have written their first parser from scratch without formal training typically start with recursive descent. Also Chart parser and Left corner parser.

In writing parsers the first question that I always ask myself is how can I make a grammar for the language at the highest type in the Chomsky hierarchy. Here lowest is Type-0 and highest is Type-3.

Almost 90% of the time it is a Type-2 grammar (context-free grammars), then for the easer task it is a Type-3 grammar (regular grammars). I have experimented with Type-1 grammars (context-sensitive grammars) and even Type-0 grammars (unrestricted grammars).

And what's the advantage/limitation of ANTLR's Adaptive LL(*) algorithm?

See the paper written by Terrence Parr the creator of Adaptive LL(*): Adaptive LL(*) Parsing: The Power of Dynamic Analysis

In practical terms Adaptive LL(*) lets you get from a grammar to a working parser faster because you do not have to understand as much parsing theory because Adaptive LL(*) is, shall I say, nimble enough to side step the mines you unknowingly place in the grammar. The price for this is that some of the mines you unknowingly place in the grammar can lead to inefficiencies in the runtime of the parser.

For most practical programming language purposes Adaptive LL(*) is enough. IIRC Adaptive LL(*) can NOT do Type-0 grammars (unrestricted grammars) which Prolog DCG can, but as I said, most people and most common programming task only need either type 2 or type 3.

Also most parser generators are for type 2, but that does not mean they can't do type 1 or possibly type 0. I cannot be more specific as I do not have practical experience with all of them.

Anytime you use a parsing tool or library there is a learning curve to learning how to use it and what it can and can not do.

If you are new to lexing/parsing and really want to understand it more then take a course and/or read Compilers: Principles, Techniques, and Tools (2nd Edition)