Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)

Category: Programming
Author: Terence Parr
This Month Stack Overflow 2


by throwaway_pdp09   2020-11-28
Well, this is (one of my) areas so here goes. DSLs are a concept, not an implementation. As implemented they can vary from chained procedure calls to actual sub languages with lexers and parsers (and I tend to consider the latter to be 'proper' DSLs, but that's just my view).

To have a 'proper' DSL I reckon you need two things, and understanding that a thing can and should be broken out into its own sublanguage, and the ability to do so. The first takes a certain kind of nouse, or common sense. The latter requires knowing how to construct a parser properly and some knowledge of language design.

Knowing how to write a parser is not particularly complex but as the industry is driven by requirements more of knowing 'big data' frameworks rather than stuff that is often more useful, well, that's what you get, and that includes people who try to parse XML with regular expressions (check out this classic answer <

They're all worth investing the time in.

by anonymous   2019-07-21

"From the ground up" is a quite relative term, especially if you consider Python as the implementation language. I think what you are looking for is the implementation of a domain specific language (DSL). Good starting points might be this book or this one. DSLs are a wide topic, so if you provide more details, we might be able to give better tips.

by anonymous   2019-07-21

This is a great book to help get started

The stages of building a language are

  1. Lexing. Lexing means being able to read certain categories of tokens. A token can be a series of digits 12376 or text strings like 'Hello'. The lexing looks at the first character (and it may also look ahead to the second character) to determine what it is. In the case of a number, it sees a digit and then proceeds to read the series of digits (by calling a subroutine), or in the case of a string it sees a quote then proceeds to read a string. The result of the lexer is a token which is a type (a number or string in this example) and the text of the token. This is normally stored in a struct as Kind int and Text string with constants declared to represent the kinds.

  2. The next building block is the parser. The parser sees the series of tokens, so it might see Identifier then looking ahead will see an =. Then it will branch off into an assignment. The parser builds a tree. In the case of an assignment, it will build a "node" of type "assign" then it will store the identifier in the first child and the expression in the second child. All tree nodes are "operations", meaning that they do something. You will not just a string or integer as a Node, you will have "Add" or "Append" etc as nodes (unless it is an expression, but expressions are contained by operations).

  3. The last part is execution. This is done by walking the tree and executing the nodes.

There is a lot of other machinery involved such as Memory, Scope, and the look ahead machinery. This is explained in the link above.