The AWK Programming Language

Category: Computer Science
Author: Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger
This Month Stack Overflow 3


by charlesdaniels   2020-06-12
* AWK - I read "The Awk Programming Language"[0] cover to cover. It's a very well written text, and it not all that long. The examples are very impressive. In terms of bang-for-the-buck of learning tools, AWK has definitely given me the most mileage. I probably use it at least a dozen times a day just for little things. However, you can wield it to write some very powerful scripts in very short amounts of time.

* Make - I use Make for almost every project I write, even if the rules are just .PHONY shortcuts. Make solves a huge set of problems relating to the order in which things need to be run/built very well. There are arguable better tools, but Make is widely deployed, widely used, widely understood, and solve the problem well enough for a lot of small to medium projects (and some large ones too!). I got asked about it enough that I wrote an introductory guide for it[1] (disclaimer: self promotion of my own site, but I don't have any ads or make any money). If you feel like Make has a lot of legacy crust built up over the years, you should read[2].

* Graphviz[3] - a huge number of ad-hoc data structures that you will build for your projects can be hard to visualize, Graphviz makes it easier. One tactic I've found useful is to loop over nested structs using their memory address as the identifier in Graphviz, struct fields as text annotations, and nested struct pointers as outgoing links. This might sound fancy, but you can probably write an export_to_graphviz() function for your project in under 50 lines of C. Because the syntax is simple, it's very easy to generate Graphviz from pretty much any language out there.

* Xpath - if you've ever wanted to do even simple web scraping or XML parsing, do yourself the favor of learning Xpath. It's a very powerful way of querying XML-like documents. I learned it by writing bots in Selenium for an internship, but nowadays I mostly do very simple web scraping for personal projects. To that end, I wrote a little tool[4] to grab the contents of a page, run a query, and print the results out on the console.

* Not a tool per se, but pick some kind of "personal knowledge management" type of solution and use it religiously. I like Joplin[5], but there are a million out there (Evernote, OneNote, ZimWiki, TiddlyWiki, VimWiki, Emacs Org-Mode, and many, many more). Being able to refer to earlier notes is invaluable for long-running projects.

* Also not very specific - learn the scripting language for your platform. In UNIX-land that's sh (or Bash), and in Windows that's PowerShell. Bash and PowerShell both have benefits and drawbacks, and you probably shouldn't write "real programs" in either. But knowing how to script whatever platform you're using buys you a lot.

* One more, also non-specific one - learn an interpreted language. Nowadays people like Python, but Perl, TCL, Lua, JS, and others could all be valid choices. These are great for prototyping ideas that you will later port to the language you really use (if it isn't on that list already), or for writing little tools or utilities for yourself to use. Which one you choose will depend on what library ecosystem is most relevant to your work.

0 -

1 -

2 -

3 -

4 -

5 -

by anonymous   2019-07-21

It is kinda 'whooaoaa man, how can that work???' - but I think you are describing the phenomenon known as 'self-hosting':

Languages (or toolchains/platforms) don't start out as self-hosting - they start off life having been built on an existing platform: at a certain point they become functional enough to allow programs to be written which understand the syntax which it itself happens to be written in.

There is a great example in the classic AWK book, which introduces an AWK program which can parse (a cut-down version as it happens) other AWK programs: see link below.

There is another example in the book "Beautiful Code" which has a Javascript program which can parse Javascript.

I think the thing to remember on this - if you have (say) a JVM written in Java which can therefore run Java Byte code: the JVM which runs the Java JVM itself has to be hosted natively (perhaps this JVM was written in 'C' and then compiled to machine code) : this is true in any case of a self-hosting program eventually - somewhere along the line.

So the mystery is removed - because at some point, there is a native machine-code program running below everything.

It kinda of equivalent of being able to describe the English (etc) language using the English language itself....maybe...

by anon   2019-07-21

In the original (and still the best) book, The AWK Programming Language, the following are implemented (among many other things):

  • a simple assembler
  • recursive descent compiler
  • a text indexing program

Try doing that with sed.

by anon   2019-07-21

The AWK Programming Language, by Aho, Kernighan and Weinberger is the best. The initials of the author's names should tell you why...

by fooblitzky   2019-04-19
_The AWK Programming Language_ ( ) is one of the best programming books, on any language, in my opinion. Worth reading even if you don't use awk. In less than 200 pages it covers an introduction to the language, through to implementing a relational database, recursive-descent parsing, and graph-based algorithms.

For gawk, the manual covers the gaps between the language introduced in the book and the latest implementation.