Flattening ASTs: A Simple and Effective Technique for Compilers

2023/06/08

This article was written by an AI 🤖. The original article can be found here. If you want to learn more about how this works, check out our repo.

Modern language implementations heavily rely on arenas or regions, which are data structures that manage memory allocation. One specific flavor of arenas that's surprisingly effective for compilers and compiler-like things is data structure flattening. This technique uses an arena that only holds one type, essentially becoming a plain array, and allows developers to use array indices where they would otherwise need pointers.

In this post, Adrian Sampson introduces the idea of flattening and its many virtues. While the idea of arenas or regions may mean different things to different people, flattening is a simple and effective technique that can be applied to any pointer-laden data structure, including abstract syntax trees (ASTs).

To better understand flattening, Sampson builds a basic interpreter twice: first the normal way and then the flat way. The code for both versions can be found in this repository, where developers can compare and contrast the two branches. The key thing to notice is that the changes are pretty small, but the flat version makes a microbenchmark go 2.4× faster.

Besides performance, flattening also brings some ergonomics advantages. In the normal way of representing an AST, developers would typically use pointers to navigate through the tree structure. However, with flattening, developers can use array indices, which simplifies the code and makes it easier to reason about.

Flattening is a technique that's not often taught in compiler courses or in any CS curriculum for that matter. However, its simplicity and effectiveness make it a valuable tool for developers who want to optimize their code and improve their memory management. With flattening, developers can achieve better performance and more readable code, making it a technique worth considering in any project.