Last updated: 2019-01-08 08:15:38 +0000

Upstream URL: git clone


View repository

View issue tracker

Contents of README follows

Machine Learning for Haskell - Feature Extraction

This repository provides a <em>recurrent clustering</em> algorithm, which assigns feature vectors to Haskell Core syntax trees. This works by turning the tree structure into a (left-biased) matrix then concatenating the rows. Each symbol in the tree is given a number (feature): keywords use hard-coded numbers, whilst names are dereferenced to get <em>their</em> syntax trees, grouped into clusters based on the features of those trees, and the index of the cluster is used as the feature.


This is a normal Haskell project using the Cabal build system. We also provide definitions for the Nix package manager and a benchmark suite for the Airspeed Velocity (ASV) framework (using the <code>asv-nix</code> plugin).

<ul> <li>

<code>asv.conf.json</code> is the Airspeed Velocity configuration. It sets up the benchmark suite in <code>benchmarks/</code>

</li> <li>

<code>benchmarks/</code> defines our benchmarks. The <code>default.nix</code> file will be loaded for each git revision, and defines a <code>python</code> executable whose environment contains values, commands, etc. from that revision for use by the benchmark scripts (the <code>*.py</code> files).

</li> <li>

<code>overlay.nix</code> is a Nix overlay, providing (among other things) a Haskell package set containing this <code>ML4HSFE</code> package, a standalone <code>ML4HSFE</code> package taken from that set and an instantiation of the <code>test.nix</code> tests.

</li> <li>

<code>overlayed.nix</code> applies our overlay to a pinned version of Nixpkgs. This should work regardless of what version your system provides.

</li> <li>

<code>release.nix</code> chooses those packages from <code>overlayed.nix</code> which we want to build and provide via continuous integration.

</li> <li>

<code>tests.nix</code> contains some integration tests which can't be run from the Cabal test suite.

</li> </ul>


The included <code>shell.nix</code> file provides suitable versions of GHC and asv. During development I use ghcid to build and run tests after each save. I find the following <code>.ghcid</code> file works well:

<pre><code>--command="nix-shell --run 'cabal new-repl'" --test=":! cabal new-test"</code></pre>

The call to <code>nix-shell</code> ensures that the GHC defined in <code>shell.nix</code> is used, and <code>cabal new-repl</code> invokes GHCi with this library in scope. The contents of the <code>--test</code> option will be sent to GHCi whenever the project is successfully built/loaded. We use <code>:!</code> to invoke a shell command, and <code>cabal new-main</code> is the command to run the test suite.