Active Code
“Active code” is the term used by the Babel system, part of Emacs’s Org-mode. It refers to authoring systems which can execute code embedded in the documents they’re rendering. This page documents the active code system I use to write articles, most notably the HTML on my Web site.
Babel, Org mode and Emacs are all wonderful things, however there are a few reasons why we may want to avoid them, which can be summarised by saying “they’re not UNIX”:
- Emacs is a large dependency; it would be nice to have a small binary or script which perform only the conversion process
- Org mode is packed full of features; Babel alone has various ways of handling input, output, sessions, variables, interpreting results, etc. The UNIX approach would use a few simple, general principles rather than exposing all of these separately.
- Babel, Org mode, etc. are moving targets; it would be nice to avoid an upgrade treadmill if possible.
The Alternative: Pandoc
Pandoc is a great document conversion program by John MacFarlane. It can convert between various markup languages, including HTML, LaTeX and Markdown. We can also mix and match the formats, for example embedding a mixture of HTML and LaTeX in a Markdown document and rendering it to a PDF.
In particular, most of the source of my Web site is written in Markdown and converted to HTML using Pandoc. I use Nix to orchestrate the process.
Embedded Code
Most of the following examples are written in Pandoc’s Markdown format, but they can also be used with other formats supported by Pandoc (e.g. HTML).
Pandoc supports code blocks, which can be written in three different ways in Markdown:
`echo "Inline code"`
echo "Indented code"
```
echo "Fenced code"
```
As you can see, by default these get rendered in monospaced fonts.
The “inline” form, as the name suggests, gets rendered as part of any
surrounding text like this
. The other two forms make
“blocks”, which get rendered like separate paragraphs.
Code blocks can have “attributes”, “classes” and an “id”. In markdown, these look like:
```{#SomeID .SomeClass .AnotherClass Attribute1="Value1" Attribute2="Value2"}
Some content
```
This lets us manipulate blocks, for example if we’re rendering to HTML we might use these IDs, classes and attributes from some associated Javascript. Pandoc also uses classes to apply syntax-highlighting, based on language descriptions from Kate.
For those who don’t want to write markdown, here’s the equivalent HTML input:
<code>echo "inline code"</code>
<pre>
echo "Code block"
</pre>
<pre id="SomeID" class="SomeClass AnotherClass" Attribute1="Value"
Attribute2="Value2">
Some content
</pre>
Whether markdown, HTML or anything else, these are the standard, off-the-shelf ways to embed code snippets in a document.
However, such code is not active.
PanPipe
PanPipe is a
Pandoc filter which
walks the document tree looking for code (inline or blocks) annotated
with a pipe
attribute, like this:
```{pipe="sh"}
echo "Hello world!"
```
When such code is found, the following steps take place:
- The value of the
pipe
attribute (sh
) is executed as a shell command. - The element’s content (
echo "Hello world!"
) is removed and piped into that shell command’s standard input. - The standard output of the command (
Hello world!
) is inserted into the element as its new body.
For example, running the above through pandoc --filter panpipe
gives:
Hello world!
Note that the pipe
attribute is not a “label”
telling PanPipe “which language to use”, or anything to that effect.
It’s a shell command: nothing more, but also nothing less. For example,
rendering:
```{pipe="tr l L | sed -e 's/ /_/g'"}
Hello world!
```
Yields a document containing:
HeLLo_worLd!
PanHandle
PanHandle is a
Pandoc filter which looks for code in an unwrap
class. It
extracts the code, which is assumed to be in ‘Pandoc JSON’ format, and
splices it into the surrounding document.
We can turn any Pandoc-supported format into Pandoc JSON by piping it
through pandoc -t json
For example, if we take the JSON for this Markdown table:
X NOT(X)
- ------
T F
F T
and wrap it in an unwrap
code block, we get:
```{.unwrap}
{"pandoc-api-version":[1,23,1],"meta":{},"blocks":[{"t":"Table","c":[["",[],[]],[null,[]],[[{"t":"AlignDefault"},{"t":"ColWidthDefault"}],[{"t":"AlignDefault"},{"t":"ColWidthDefault"}]],[["",[],[]],[[["",[],[]],[[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"X"}]}]],[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"NOT(X)"}]}]]]]]],[[["",[],[]],0,[],[[["",[],[]],[[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"T"}]}]],[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"F"}]}]]]],[["",[],[]],[[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"F"}]}]],[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"T"}]}]]]]]]],[["",[],[]],[]]]}]}
```
When we send our document through pandoc --filter panhandle
,
the table will be spliced into the document, like this:
X | NOT(X) |
---|---|
T | F |
F | T |
On its own, PanHandle is pretty useless. The Pandoc JSON format is just a program-specific, non-standard, rather ugly intermediate format; not something we should be writing our documents in. Besides which, if we want to embed something like a table in our documents, we should just go ahead and put the damned thing where it’s supposed to be; rather than encoding it into Pandoc JSON, sticking it in a code block, then using PanHandle to unwrap it and decode it again!
The point of PanHandle isn’t to unwrap hard-coded strings of JSON, like the table example above; it’s to unwrap procedurally generated JSON, i.e. the output of PanPipe. PanPipe is specifically designed to only manipulate the contents of code blocks: it cannot interfere with the rest of the document. This is a useful restriction, since we may be calling out to arbitrary commands. By using PanHandle, we have a single, simple, predictable and opt-in way to splice generated content into our documents.
Examples
Some non-toy examples of this system in action:
This Site
This whole site is static HTML generated from Markdown with these tools. Not every page takes advantage of these capabilities, but it’s nice to know they’re available when I need them. You may like to browse this page’s source to see how the example output is derived straight from the examples themselves (note that this requires meta-programming, which complicates things a little).
Fibonacci Post
I wrote PanPipe and PanHandle after trying and failing to integrate Babel into my site’s build process. My Fibonacci Sequence in PHP post was an experiment with Babel, so porting that post over to Pandoc was the motivating use-case for these scripts. Thankfully the port was a success, and that post is now rendered by Pandoc like the rest of the site.
If you compare it to the source you’ll see a few of the required features which influenced my thinking:
- A temporary directory for downloading dependencies
- Rendering, executing, or rendering and executing code snippets
- Concatenating code snippets together in a source file (“tangling”)
- Rendering code output verbatim or interpreted (eg. as a table or image)
- Hidden blocks for writing helper functions and unit tests
- Aborting rendering when unit tests fail
Useful Tricks
These simple scripts let us call out to the UNIX shell from our documents. This lets us recreate many of the active code features of Babel, just by piping between programs and reading/writing files. Here are some common tasks you may want to solve:
Hiding Output
You may want a code block to execute, but not show up in the output.
The easiest way is to pipe the output to /dev/null
, or an
actual file if we plan to use it later:
```{pipe="sh > /dev/null"}
ls /
```
```{pipe="sh > contents"}
ls /
```
This works well for HTML, and results in:
<pre><code></code></pre>
<pre><code></code></pre>
Sometimes these empty elements may have undesirable effects,
e.g. interacting badly with some styling rule. If this is the case, you
might try using inline snippets instead, e.g. `ls /
`{pipe="sh > /dev/null"}
, which gives <p><code></code></p>
.
Splicing Nothing
To eliminate the code block takes a little more effort, but might be
necessary in some cases. To remove a code block, we can use
panhandle
to splice an empty document in its place.
Remember that panhandle
accepts JSON, which we can
generate using pandoc
:
```{.unwrap pipe="sh | echo '' | pandoc -t json"}
ls /
```
Here’s the result when converting to HTML:
Ta da! If our code block has any extra attributes, etc. then a
div
will be left behind to catch them, for example:
```{.unwrap pipe="sh | echo '' | pandoc -t json" myattr="myvalue"}
ls /
```
This gives:
<div data-myattr="myvalue">
</div>
Format-specific
If you’re targetting a specific output format, you can use techniques specific to that format.
For example, if you’re rendering to HTML, you can hide code blocks with CSS:
```{pipe="sh" style="display: none;"}
ls /
```
This results in:
<pre style="display: none;"><code>bin
build
dev
etc
nix
proc
tmp</code></pre>
If you’re using LaTeX you can use if
statements to skip
over the block (it will still be executed, but the result won’t be
rendered):
\iffalse
```{pipe="sh"}
ls /
```
\fi
Showing Code and Output
We can use tee
to save a copy of our code into a file,
then run it in another code block:
```{.php pipe="tee script.php"}
<?php
echo 10 + 20;
```
```{pipe="sh"}
php script.php
```
This results in:
<?php
echo 10 + 20;
30
Tangling
Use tee -a
to append to a file. Make sure to include
extra newlines as needed:
```{.haskell pipe="tee -a tangled.hs"}
foo = "Hello"
```
```{.haskell pipe="tee -a tangled.hs"}
bar = "World"
```
```{.haskell pipe="ghci -v0"}
:load tangled.hs
print (foo ++ " " ++ bar)
```
This gives:
= "Hello" foo
= "World" bar
"Hello World"
Execution Order
PanPipe executes code in the order it appears in the source document (although it uses two passes: one for code blocks and one for inline code, so it’s a bad idea to rely on execution order between the two).
We can change the order that results are displayed in by capturing their output to files and dumping them out later. For example, to show a program listing after its results:
```{pipe="cat > code.sh"}
echo "Hello"
echo "World"
```
```{pipe="sh"}
sh code.sh
```
```{.bash pipe="sh"}
cat code.sh
```
This produces:
Hello
World
echo "Hello"
echo "World"
Procedural Documents
We can generate content using PanPipe, send it through Pandoc to get JSON, then use PanHandle to splice it into the document. For example:
```{.unwrap pipe="php | pandoc -t json"}
<?php
foreach (range(1, 10) as $x) {
echo " - Element $x\n";
}
```
This produces:
- Element 1
- Element 2
- Element 3
- Element 4
- Element 5
- Element 6
- Element 7
- Element 8
- Element 9
- Element 10
Importing Sub-Documents
We can use PanPipe to dump the contents of files and PanHandle to combine them together. We can even call Pandoc recursively:
```{.unwrap pipe="sh"}
pandoc -t json header.md
```
```{.unwrap pipe="sh"}
pandoc -t json footer.md
```
Including Images
We can obtain image files using PanPipe, then encode them in data URIs. PanHandle will splice these into the document:
```{pipe="php > carpet.pbm"}
<?php
$scale = 5;
$dim = pow(3, $scale);
$max = ($dim * $dim) - 1;
function carpet($x, $y) {
if ($x % 3 == 1 && $y % 3 == 1) return 0;
return ($x || $y)? carpet(intval($x / 3),
intval($y / 3))
: 1;
}
$colour = function($c) use ($dim) {
$x = $c % $dim;
$y = ($c - $x) / $dim;
return carpet($x, $y);
};
echo "P1 $dim $dim\n";
echo implode("\n", array_map($colour, range(0, $max)));
```
```{.unwrap pipe="sh | pandoc -t json"}
convert carpet.pbm carpet.png
echo -n '<img alt="Sierpinski Carpet" src="data:image/png;base64,'
base64 -w 0 carpet.png
echo -n '" />'
```
This results in:
Handling Errors
In general, errors should abort the rendering. We would rather have no document than an erroneous one.
If you want to trigger an error from a command, just have it return a non-zero exit code:
```{pipe="sh"}
if [ ! -e "foo" ]
then
exit 1
fi
cat foo
```
If you want to carry on rendering in the presence of errors, you must implement some kind of error handling to ensure your command exits with a success code. For example, in shell scripts:
```{pipe="sh"}
./dodgyCommand || echo "dodgyCommand didn't work; oh well!"
```
Anything printed to stderr by a shell command will appear in the
stderr of PanPipe. Likewise, when used as a Pandoc filter, PanPipe’s
stderr will appear in Pandoc’s stderr. Note that Pandoc may buffer the
stderr stream, which prevents content showing immediately (eg.
progress bars and such). To prevent this, you can use
pandoc -t json | panpipe | panhandle | pandoc -f json
rather than pandoc --filter panpipe --filter panhandle
.