Scripting with Nix
NPM was originally a rudimentary package manager for Javascript and node.js, but these days it seems to be getting used as a general way of fetching dependencies for things like shell scripts.
I’ve not played with node.js since the very early days, and never really used NPM. However, I do see the need for modular, composable shell scripts.
Nix for Scripting
Personally, I’ve been using Nix in a similar way, since it also has
nice features like caching, laziness, splicing into indented strings,
dependency management, etc. For example, if you have a Nix expression
stored in my-script.nix
you can use the following (e.g. in
my-script.sh
) to invoke it:
nix-instantiate --read-write-mode --show-trace --eval -E 'import ./my-script.nix'
The --eval
tells Nix to evaluate an expression, rather
than build a package. -E
is the expression to evaluate (in
this case, importing our “script” file). --read-write-mode
allows the script to add things to the Nix store (which is normally
read-only). --show-trace
will help us to debug, by showing
a backtrace if a runtime error occurs.
Note that Nix makes a distinction between “evaluation time” and
“build time” (similar to the compile/run time distinction in many other
languages). The above command will evaluate the contents of
my-script.nix
, which should be a Nix expression, but most
of the time that’s not our actual script (after all, Nix is a
pure language, and hence unsuitable for many scripting
applications).
Instead, we usually use Nix expressions to compose a “derivation” (basically a package), into which we can put all sorts of scripts.
If my-script.nix
evaluates to a derivation, we can use
this command instead of the above, to build it:
nix-build --show-trace -E 'import ./my-script.nix'
This creates a symlink called result
, so if our
derivation is a script we can then do ./result
to run
it.
A cleaner way would be to use the --no-out-link
option
to prevent cluttering up the working directory with symlinks, capture
the stdout using e.g. SCRIPT=$(...)
, and running the
resulting string using "$SCRIPT"
(since it will contain the
Nix store path to the result).
By the way, if you want to evaluate Nix expressions interactively,
the nix-repl
command is much easier to use than
nix-instantiate
(although it doesn’t seem to offer a
--show-trace
option).
Also note that we can use nix-shell shebangs to
fetch dependencies of a script at run time; although I find that
separate Nix files are usually needed to work around the limitations of
writing expressions in a shebang line, and it can often be quite slow to
have nix-shell
invoked on each run, when we could instead
build a script with its dependencies once, and invoke them as many times
as we like with little overhead.
Strings
The reason I like Nix’s treatment of strings is that we can write “indented strings”, which means that long strings (such as scripts) can be embedded inside other expressions quite naturally, without getting filled with whitespace. For example, here’s a Nix expression:
runCommand "foo"
{ buildInputs = [ python imagemagick ]; }
''
I am an indented string
I will be executed as a bash script, with the following
dependencies available:
- python
- imagemagick
Since these lines, and those sentences above beginning with "I",
have the least indentation, they will appear flush to the left in
the resulting file. The "list" above will hence be indented by 1
space, rather than the 14 spaces which appear here.
''
Secondly, splicing allows Nix expressions to be embedded inside
strings. A splice begins with ${
and ends with
}
. The expression should either evaluate to a string, which
is inserted as-is, or a “derivation” (e.g. a package), which gets
“instantiated” (i.e. installed) and its installation directory is
inserted into the resulting string. Splices can be nested too.
For example, instead of giving python
as a dependency in
the buildInputs
, we could splice the full path into a
string, e.g.
''
"${python}/bin/python" my_script.py
''
Although this is probably a bad idea, since there may be transitive dependencies, etc. missing when the script gets executed.
Splices aren’t just a simple hack for resolving variable names; they drop us into a complete Nix expression context, where we can write arbitrarily complicated expressions, including other strings, containing other splices, etc.
Decomposing
If we want to build up a result incrementally, with each step getting
cached, we can use runCommand
, and write the results to
“$out”. For example:
with import <nixpkgs> {};
with builtins;
with rec {
# Takes a script and runs it with jq available (Nix functions are curried)
runJq = runCommand "jq-cmd" { buildInputs = [ jq ]; };
step1 = runJq ''
echo "I am step 1" 1>&2
echo '[{"name": "foo"}, {"name": "bar"}]' | jq 'map(.name)' > "$out"
'';
step2 = runJq ''
echo "I am step 2" 1>&2
I won't be executed, because Nix is lazy and nothing calls me
'';
step3 = runJq ''
echo "I am step 3" 1>&2
jq 'length' < "${step1}" > "$out"
'';
};
import step3
When evaluated, this gives the following:
$ ./go.sh
building path(s) ‘/nix/store/5ks08zbvmgzbhg9kr0k4g75nf2ymsqsr-jq-cmd’
I am step 1
building path(s) ‘/nix/store/v1svcqq6cmi4xc9650qz9w2x177w4pfr-jq-cmd’
I am step 3
2
$ ./go.sh
2
The results are cached, and will be re-used as long as the commands aren’t edited, and their dependencies don’t change (e.g. if we update nixpkgs and a newer version of jq is available, they’ll be re-run with that version).
In this case, each “step” represents the data, which is common in
lazy languages. Alternatively, we can use writeScript
to
write more ‘traditional’ process-oriented scripts:
with import <nixpkgs> {};
with builtins;
with rec {
# Takes a script and runs it with jq available (Nix functions are curried)
runJq = runCommand "jq-cmd" { buildInputs = [ jq ]; };
step1 = writeScript "step-1" ''
echo "I am step 1" 1>&2
echo '[{"name": "foo"}, {"name": "bar"}]' | jq 'map(.name)'
'';
step2 = writeScript "step-2" ''
echo "I am step 2" 1>&2
I won't be executed, because Nix is lazy and nothing calls me
'';
step3 = writeScript "step-3" ''
echo "I am step 3" 1>&2
"${step1}" | jq 'length'
'';
};
import (runJq ''"${step3}" > "$out"'')
Of course, we need something to invoke these scripts, which is why I
used runJq
in the final expression. When run, we get:
$ ./go.sh
building path(s) ‘/nix/store/fnw68cmkib5fkmhls4fkdhx0vb2cyka8-step-1’
building path(s) ‘/nix/store/1kiwa6m11d0apxfjbwpqq3vl6jbv3sdx-step-3’
building path(s) ‘/nix/store/9hv1jcrglyx8x6xa64pnds6vzcp35zl5-jq-cmd’
I am step 3
I am step 1
2
$ ./go.sh
2
This time the scripts are cached, but we execute them both together
in a normal pipe. The overall result of the “runJq” call is still cached
though. This is how you’d run non-bash scripts too: by using
writeFile
to save your code to disk, and
runCommand
to invoke it with a bash one-liner. For example,
if we want step4
to use Haskell we might do the
following:
runJq = runCommand "jq-cmd" { buildInputs = [ jq ghc ]; };
...
hsScript = script: writeScript "hs-cmd" ''
runhaskell "${writeScript "hs-script" script}"
'';
...
step4 = hsScript ''
doTimes :: (Show a) => a -> String -> String
doTimes str n = show (replicate (read n) str)
hello = "hello world"
main = interact (doTimes hello)
'';
};
import (runJq ''"${step3}" | "${step4}" > "$out"'')
This reads the length given by jq, and writes out a list of that many “hello world”s:
$ ./go.sh
building path(s) ‘/nix/store/2d7wrd78dk1ilj84adnyq8ddgzy6m2rr-hs-cmd’
building path(s) ‘/nix/store/haqcwssfbzbj5s4ampv322qbpll1gw1h-jq-cmd’
I am step 3
I am step 1
"[\"hello world\",\"hello world\"]"
Unfortunately, this can end up separating the code from its
dependencies, i.e. we needed to give ghc
as a dependency to
whichever script invokes step4
(via runJq
),
rather than being able to add it in hsScript
. If we used
the original data-oriented approach, this wouldn’t be an issue.
To mitigate this, my Nix config
provides a wrap
function (documented here) which lets
us write scripts which “bake in” their dependencies:
myScript = wrap {
name = "my-script";
paths = [ bash ghc jq python ];
vars = { someEnvVar = "Arbitrary content"; };
script = ''
#!/usr/bin/env bash
...do something here with "$someEnvVar" set, and a PATH containing bash,
ghc, jq and python...
'';
};
Since I often use wrap
to make scripts which I then
expose via bin/
, I’ve also made a function
mkBin
which takes the same arguments as wrap
,
but instead of just building a script, it will put that script into a
bin/
subdirectory of its output, with the filename taken
from name
(e.g. bin/my-script
in this
case).
Note that we can also write our scripts in standalone files, and
reference them from Nix by giving the path to the file,
e.g. ./myPythonScript.py
. Since these are just regular
files, we can use syntax highlighting, etc. when editing them. We also
avoid having to care about escaping, but note that we can’t use string
interpolation this way.
We also end up exposing the “raw” script (i.e. without its
dependencies). If you want to target non-Nix users, that can be useful;
otherwise, you might want to make it clear that it’s an internal detail
which shouldn’t be run as-is (e.g. you could put it in a subdirectory
called internal
or src
or something).
I/O
It’s also pretty easy to transfer data between the Nix language and
the processes we’re invoking, using readFile
,
readDir
, fromJSON
and toJSON
inside a splice; although Nix doesn’t support floats, so you might need
to turn them into strings first. This is useful for doing tricky
transformations on small amounts of data, which may be error-prone in
bash, but where invoking a full-blown language like Haskell or Python
would be overkill. It can also be useful for things like assertions.
Note that my preferred way of accessing the result of a derivation
from within Nix is to use import
rather than
readFile
. One advantage is that we’re guaranteed that Nix
will build the result as needed, rather than saying ‘no such file’
(e.g. if we’ren ot in read/write mode).
Another is that Nix will not propagate dependencies of that
derivation into the value that it reads; for example if we have a
derivation foo
which depends on Python and results in a
file containing the string "hello"
, then
import foo
will give the string hello
without tracking the Python dependency. This can be important
if we have a fast-moving dependency which might cause lots of
rebuilding, but doesn’t actually affect the contents of its output: we
can generate some intermediate value using this dependency, which will
get rebuilt a lot, but by importing that value for use in the subsequent
steps, those steps won’t need to be rebuilt as long as that intermediate
value doesn’t change.
Of course, the tricky part is writing data into Nix files such that
they can be imported in the first place. Nix’s syntax is quite simple,
but it can sometimes help to use fromJSON
, etc. to make
life easier.