Small Functions

Posted on by Chris Warburton

I saw Small functions considered harmful on Hacker News today, and wanted to comment but it got a little long, so I've posted it here ;)

As a lot of other commenters point out, too much of anything is a bad thing (by definition!).

Whilst reading the article, a few things bubbled up in my subconscious (NOTE: the below "quotes" are paraphrased, but hopefully not misrepresenting the arguments):

Splitting functionality over many small functions makes it harder to read/see what's happening

If you find yourself wanting/needing to read a whole bunch of definitions at once, then that is the problem that needs solving; not necessarily the fact that so many functions are used. This can be partially due to the code (i.e. when it really does require digging down a few levels to understand what's going on), but it can also be due to the developer or team psychology (i.e. subconsciously doubting that doFoo actually does 'foo').

The latter might be related to the difficulty many people have with recursion, since that also requires the ability to reason under the assumption that some "black box" function call works as expected. For example, mergesort is trivial to understand under the assumption that mergesort(firstHalf) and mergesort(secondHalf) work as advertised; without that suspension of disbelief, it's utter voodoo.

Note that as a code smell (rather than an overly distrustful dev), this might be due to not enough abstraction, as well as too much or misaligned abstractions like the article mentioned. For example, I might open a file at random and pick a random function, say sendOrder: if that contains a bunch of number-twiddling, deletes some fields, rearranges some lists, etc. then I might doubt its correctness, since I have no idea why those things have anything to do with sending orders; on the other hand, if that stuff were pulled out into a formatForInitech function then I might find it reasonable that sending an order requires formatting it for initech, and if I stumbled upon the formatForInitech function at random then I might find it reasonable that such formatting requires deleting fields, twiddling numbers, etc. Note that this could also be achieved by adding a comment, but I don't think that changes the argument: if I see // Required to comply with Initech systems followed by some gobbledegook, I should (be able to) have confidence that the code is doing what the comment says (and no more).

This confidence in randomy-chosen functions isn't just a nice-to-have property of a codebase either: when we're debugging we know there's a problem somewhere, and it saves time and mental capacity if we can skim over unrelated things when doing a deep dive into the problematic area.

Another problem with lack of abstraction, which can force us to dig into function definitions, is when our API is too fine-grained: it allows things which don't make sense in the domain, and hence places a burden on us to avoid such incoherent states. If we try sticking to the "language keyword" level, it's very easy to end up in such situations. For example, a common (anti?) pattern is an "iterator" object, with methods like hasNext and next. This lets us use our language's built-in for loops, but it couples control flow to otherwise-meaningless state (the "cursor position" of our iterator) whose leaks can cause problems like composition-breaking interference (if some inner call tries to loop over the same iterator). Much better to provide small functions like map and filter (or, even better, domain-specific alternatives), which are "unfamiliar" compared to keywords but which avoid inconsistent states and make sense in the domain.

If functions are "invisibly" coupled to each other, e.g. doBar(...) assumes that doFoo(...) has been called beforehand, then I'd say these may legitimately be "too small"; specifically, we might say they aren't "doing one thing", they're actually doing "half a thing": again, the API is allowing too many things (i.e. "barring in an unfooed state"). A common example is dbQuery(...) requiring dbConnect(...) to be run first.

Inlining isn't the only, or necessarily best, solution though. If a "fooed state" makes sense in the problem domain, e.g. "a connected database", then another solution is to have doFoo return the relevant state and have doBar consume it, so we get doBar(..., doFoo(...)). This is especially useful if we might want to re-use the same result (e.g. the same DB connection) many times. We can introduce nice types to ensure correct usage too, e.g.

dbConnect : DbCredentials -> Database

dbQuery : Database -> Sql -> Table

This way there's no chance to attempt a query without having connected.

Alternatively, if the "intermediate state" doesn't make sense in our domain and is only an implementation detail, then another solution is to have a withFoo function, which takes a function as argument and runs it in the correct state, e.g.

withFoo = function(f) { doFoo(); return f(); };

This came to mind when I saw renderPageWithSetupAndTeardown: I agree that it seems rather pungent. One possible alternative would be to have the setup and teardown functions be (optional?) fields of a page, which the regular renderPage function would call if found. Alternatively, we could pull out a function:

withSetupAndTeardown = function(setup, teardown, f) {
                         result = f(setup());
                         teardown(result);
                         return result;
                       };

This can be used with anything, including the regular renderPage function (this is analogous to the widely-used bracket function in Haskell, for example).

I think that this specific renderPageWithSetupAndTeardown example is actually quite weak, since it smells to me of coming from a language without first-class functions, which is going to make any proposed solutions overly convoluted (since "setup" and "teardown" are inherently function-like concepts), and hence there are many ways to improve it in languages with first-class functions (like Ruby, or whatever).