Every now and then I feel a need to programatically generate and execute a Makefile: for example, to execute a graph-based procedural generation workflow, or to compile some game assets that are discovered dynamically by a tool – in general, all these problems can be represented as directed acyclic graphs of files and tasks.

At some point it gets quite tiring to write the same code over and over, so I went ahead and made it into a little library. It’s pretty straightforward. First, you create a graph.

import goeiedag

graph = goeiedag.CommandGraph()

Afterwards, you add tasks to it…

from pathlib import Path

# Get username
graph.add(["whoami", ">", "username.txt"],
          inputs=[],
          outputs=["username.txt"])

…and execute it.

goeiedag.build_all(graph, Path())

Note that if you run this code twice, it will not re-execute the whoami command, since the output already exists and is considered up-to-date.

For commands that have inputs (or dependencies), the build tool needs to know about these, so that it is able to determine when the output has become obsolete and needs to be rebuilt. Of course, one command can depend on outputs from other commands, as long as there are no circular dependencies.

# Extract OS name from /etc/os-release
graph.add(["grep", "^NAME=", "/etc/os-release", ">", "os-name.txt"],
          inputs=["/etc/os-release"],
          outputs=["os-name.txt"])

To make usage more convenient and avoid repetition, the library provides some special symbols to let you refer to the declared inputs and outputs when building up the command.

from goeiedag import ALL_INPUTS, INPUT, OUTPUT

# Extract OS name from /etc/os-release
graph.add(["grep", "^NAME=", INPUT, ">", OUTPUT],
          inputs=["/etc/os-release"],
          outputs=["os-name.txt"])
# Get username
graph.add(["whoami", ">", OUTPUT],
          inputs=[],
          outputs=["username.txt"])
# Glue together to produce output
graph.add(["cat", ALL_INPUTS, ">", OUTPUT.result],
          inputs=["os-name.txt", "username.txt"],
          outputs=dict(result="result.txt"))

When graph.build_all is called, the library will generate a Ninja build file and execute it. I didn’t feel a need to re-implement the logic of orchestrating and executing the DAG efficiently, so the aim was to take advantage of an existing build executor. Compared to Make, Ninja has more pleasant tooling, better performance when dealing with complex builds, and is also easier to generate code for (for one, it cleanly supports tasks that generate multiple outputs).

There exist many similar libraries with somewhat different paradigms; for example, Dask and TaskGraph use plain Python functions, rather than shell commands, as a unit of execution. This is advantageous if your flow is Python-heavy, but it locks you out of taking advantage of high-quality build executors like Ninja. Snakemake does work with commands, but AFAIU doesn’t help you build the task graph programatically – the input has to be provided in Snakemake’s text format.

The package is available on PyPI and GitHub. Let me know what you think!