Active Code

“Active code” is the term used by the Babel system, part of Emacs’s Org-mode. It refers to authoring systems which can execute code embedded in the documents they’re rendering. This page documents the active code system I use to write articles, most notably the HTML on my Web site.

Why not Babel?

Babel, Org mode and Emacs are all wonderful things, however there are a few reasons why we may want to avoid them, which can be summarised by saying “they’re not UNIX”:

The Alternative: Pandoc

Pandoc is a great document conversion program by John MacFarlane. It can convert between various markup languages, including HTML, LaTeX and Markdown. We can also mix and match the formats, for example embedding a mixture of HTML and LaTeX in a Markdown document and rendering it to a PDF.

In particular, most of the source of my Web site is written in Markdown and converted to HTML using Pandoc. I use Nix to orchestrate the process.

Embedded Code

Most of the following examples are written in Pandoc’s Markdown format, but they can also be used with other formats supported by Pandoc (e.g. HTML).

Pandoc supports code blocks, which can be written in three different ways in Markdown:

`echo "Inline code"`
    echo "Indented code"
```
echo "Fenced code"
```

As you can see, by default these get rendered in monospaced fonts. The “inline” form, as the name suggests, gets rendered as part of any surrounding text like this. The other two forms make “blocks”, which get rendered like separate paragraphs.

Code blocks can have “attributes”, “classes” and an “id”. In markdown, these look like:

```{#SomeID .SomeClass .AnotherClass Attribute1="Value1" Attribute2="Value2"}
Some content
```

This lets us manipulate blocks, for example if we’re rendering to HTML we might use these IDs, classes and attributes from some associated Javascript. Pandoc also uses classes to apply syntax-highlighting, based on language descriptions from Kate.

For those who don’t want to write markdown, here’s the equivalent HTML input:

<code>echo "inline code"</code>

<pre>
  echo "Code block"
</pre>

<pre id="SomeID" class="SomeClass AnotherClass" Attribute1="Value"
     Attribute2="Value2">
  Some content
</pre>

Whether markdown, HTML or anything else, these are the standard, off-the-shelf ways to embed code snippets in a document.

However, such code is not active.

PanPipe

PanPipe is a Pandoc filter which walks the document tree looking for code (inline or blocks) annotated with a pipe attribute, like this:

```{pipe="sh"}
echo "Hello world!"
```

When such code is found, the following steps take place:

For example, running the above through pandoc --filter panpipe gives:

Hello world!

Note that the pipe attribute is not a “label” telling PanPipe “which language to use”, or anything to that effect. It’s a shell command: nothing more, but also nothing less. For example, rendering:

```{pipe="tr l L | sed -e 's/ /_/g'"}
Hello world!
```

Yields a document containing:

HeLLo_worLd!

PanHandle

PanHandle is a Pandoc filter which looks for code in an unwrap class. It extracts the code, which is assumed to be in ‘Pandoc JSON’ format, and splices it into the surrounding document.

We can turn any Pandoc-supported format into Pandoc JSON by piping it through pandoc -t json

For example, if we take the JSON for this Markdown table:

X NOT(X)
- ------
T F
F T

and wrap it in an unwrap code block, we get:

```{.unwrap}
{"pandoc-api-version":[1,23,1],"meta":{},"blocks":[{"t":"Table","c":[["",[],[]],[null,[]],[[{"t":"AlignDefault"},{"t":"ColWidthDefault"}],[{"t":"AlignDefault"},{"t":"ColWidthDefault"}]],[["",[],[]],[[["",[],[]],[[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"X"}]}]],[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"NOT(X)"}]}]]]]]],[[["",[],[]],0,[],[[["",[],[]],[[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"T"}]}]],[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"F"}]}]]]],[["",[],[]],[[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"F"}]}]],[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"T"}]}]]]]]]],[["",[],[]],[]]]}]}
```

When we send our document through pandoc --filter panhandle, the table will be spliced into the document, like this:

X NOT(X)
T F
F T

On its own, PanHandle is pretty useless. The Pandoc JSON format is just a program-specific, non-standard, rather ugly intermediate format; not something we should be writing our documents in. Besides which, if we want to embed something like a table in our documents, we should just go ahead and put the damned thing where it’s supposed to be; rather than encoding it into Pandoc JSON, sticking it in a code block, then using PanHandle to unwrap it and decode it again!

The point of PanHandle isn’t to unwrap hard-coded strings of JSON, like the table example above; it’s to unwrap procedurally generated JSON, i.e. the output of PanPipe. PanPipe is specifically designed to only manipulate the contents of code blocks: it cannot interfere with the rest of the document. This is a useful restriction, since we may be calling out to arbitrary commands. By using PanHandle, we have a single, simple, predictable and opt-in way to splice generated content into our documents.

Examples

Some non-toy examples of this system in action:

This Site

This whole site is static HTML generated from Markdown with these tools. Not every page takes advantage of these capabilities, but it’s nice to know they’re available when I need them. You may like to browse this page’s source to see how the example output is derived straight from the examples themselves (note that this requires meta-programming, which complicates things a little).

Fibonacci Post

I wrote PanPipe and PanHandle after trying and failing to integrate Babel into my site’s build process. My Fibonacci Sequence in PHP post was an experiment with Babel, so porting that post over to Pandoc was the motivating use-case for these scripts. Thankfully the port was a success, and that post is now rendered by Pandoc like the rest of the site.

If you compare it to the source you’ll see a few of the required features which influenced my thinking:

Useful Tricks

These simple scripts let us call out to the UNIX shell from our documents. This lets us recreate many of the active code features of Babel, just by piping between programs and reading/writing files. Here are some common tasks you may want to solve:

Hiding Output

You may want a code block to execute, but not show up in the output. The easiest way is to pipe the output to /dev/null, or an actual file if we plan to use it later:

```{pipe="sh > /dev/null"}
ls /
```

```{pipe="sh > contents"}
ls /
```

This works well for HTML, and results in:

<pre><code></code></pre>
<pre><code></code></pre>

Sometimes these empty elements may have undesirable effects, e.g. interacting badly with some styling rule. If this is the case, you might try using inline snippets instead, e.g. `ls / `{pipe="sh > /dev/null"} , which gives <p><code></code></p> .

Splicing Nothing

To eliminate the code block takes a little more effort, but might be necessary in some cases. To remove a code block, we can use panhandle to splice an empty document in its place.

Remember that panhandle accepts JSON, which we can generate using pandoc:

```{.unwrap pipe="sh | echo '' | pandoc -t json"}
ls /
```

Here’s the result when converting to HTML:


Ta da! If our code block has any extra attributes, etc. then a div will be left behind to catch them, for example:

```{.unwrap pipe="sh | echo '' | pandoc -t json" myattr="myvalue"}
ls /
```

This gives:

<div data-myattr="myvalue">

</div>

Format-specific

If you’re targetting a specific output format, you can use techniques specific to that format.

For example, if you’re rendering to HTML, you can hide code blocks with CSS:

```{pipe="sh" style="display: none;"}
ls /
```

This results in:

<pre style="display: none;"><code>bin
build
dev
etc
nix
proc
run
tmp
</code></pre>

If you’re using LaTeX you can use if statements to skip over the block (it will still be executed, but the result won’t be rendered):

\iffalse

```{pipe="sh"}
ls /
```

\fi

Showing Code and Output

We can use tee to save a copy of our code into a file, then run it in another code block:

```{.php pipe="tee script.php"}
<?php
echo 10 + 20;
```
```{pipe="sh"}
php script.php
```

This results in:

<?php
echo 10 + 20;
30

Tangling

Use tee -a to append to a file. Make sure to include extra newlines as needed:

```{.haskell pipe="tee -a tangled.hs"}
foo = "Hello"

```

```{.haskell pipe="tee -a tangled.hs"}
bar = "World"

```

```{.haskell pipe="ghci -v0"}
:load tangled.hs
print (foo ++ " " ++ bar)
```

This gives:

foo = "Hello"
bar = "World"
"Hello World"

Execution Order

PanPipe executes code in the order it appears in the source document (although it uses two passes: one for code blocks and one for inline code, so it’s a bad idea to rely on execution order between the two).

We can change the order that results are displayed in by capturing their output to files and dumping them out later. For example, to show a program listing after its results:

```{pipe="cat > code.sh"}
echo "Hello"
echo "World"
```

```{pipe="sh"}
sh code.sh
```

```{.bash pipe="sh"}
cat code.sh
```

This produces:

Hello
World
echo "Hello"
echo "World"

Procedural Documents

We can generate content using PanPipe, send it through Pandoc to get JSON, then use PanHandle to splice it into the document. For example:

```{.unwrap pipe="php | pandoc -t json"}
<?php
foreach (range(1, 10) as $x) {
  echo " - Element $x\n";
}
```

This produces:

Importing Sub-Documents

We can use PanPipe to dump the contents of files and PanHandle to combine them together. We can even call Pandoc recursively:

```{.unwrap pipe="sh"}
pandoc -t json header.md
```

```{.unwrap pipe="sh"}
pandoc -t json footer.md
```

Including Images

We can obtain image files using PanPipe, then encode them in data URIs. PanHandle will splice these into the document:

```{pipe="php > carpet.pbm"}
<?php
$scale = 5;
$dim   = pow(3, $scale);
$max   = ($dim * $dim) - 1;

function carpet($x, $y) {
  if ($x % 3 == 1 && $y % 3 == 1) return 0;
  return ($x || $y)? carpet(intval($x / 3),
                            intval($y / 3))
                   : 1;
}

$colour = function($c) use ($dim) {
  $x =  $c       % $dim;
  $y = ($c - $x) / $dim;
  return carpet($x, $y);
};

echo "P1 $dim $dim\n";
echo implode("\n", array_map($colour, range(0, $max)));
```

```{.unwrap pipe="sh | pandoc -t json"}
convert carpet.pbm carpet.png
echo -n '<img alt="Sierpinski Carpet" src="data:image/png;base64,'
base64 -w 0 carpet.png
echo -n '" />'
```

This results in:

Sierpinski Carpet

Handling Errors

In general, errors should abort the rendering. We would rather have no document than an erroneous one.

If you want to trigger an error from a command, just have it return a non-zero exit code:

```{pipe="sh"}
if [ ! -e "foo" ]
then
  exit 1
fi
cat foo
```

If you want to carry on rendering in the presence of errors, you must implement some kind of error handling to ensure your command exits with a success code. For example, in shell scripts:

```{pipe="sh"}
./dodgyCommand || echo "dodgyCommand didn't work; oh well!"
```

Anything printed to stderr by a shell command will appear in the stderr of PanPipe. Likewise, when used as a Pandoc filter, PanPipe’s stderr will appear in Pandoc’s stderr. Note that Pandoc may buffer the stderr stream, which prevents content showing immediately (eg. progress bars and such). To prevent this, you can use pandoc -t json | panpipe | panhandle | pandoc -f json rather than pandoc --filter panpipe --filter panhandle.