Dynamic scope vs. Lexical scope

Posted on by Chris Warburton

Yesterday I came across [a blog post about dynamic scope] 1. While its content is completely correct, dynamic scope is actually completely unnecessary, and in fact a Bad Thing(TM), compared to lexical scope.

In the example the name important_variable is a free variable in the functions do_something and do_something_else. This means we’ve used the variable important_variable in the functions’ definitions but haven’t declared it or accepted it as an argument. For example, in a pseudo-Javascript that uses ‘let’ to make variable scoping explicit:

let (do_something = function() {
  printf(important_variable);
}) in {
  do_something();
}

Here we bind the variable do_something to a function, so do_something is a bound variable when we call it. Inside the function we’re using two variables called printf and important_variable which are not arguments to the do_something function and are not declared with a let inside the function, so they are free variables. What happens to them depends on whatever “scoping rule” is in effect.

“Global scope” is an incredibly bad scoping rule. It keeps a table of variable names and whenever we hit a free variable, we look it up in the table. This has been used in Mathematics since antiquity, but now we have computers our Mathematics (programming) is reaching huge scales, where global scoping really falls short.

The reason is that whenever we call any function, we lose all information about what’s in the table, since any function can do anything it likes to the table. Since programming is mostly about chaining together function calls, this makes global scoping a pretty useless way of handling free variables, since we can’t really know anything that’s going on. Hence, never use global scope. Many languages claim that they disallow global scope for variables, yet they carry on using it for function names, class names, type names, etc. (as if there were any distinction!)

Another scoping rule is called “dynamic scope” and is what the above link is about. Dynamic scope dates back to McCarthy’s original LISP in the 1950s. The rule says that when we call a function with free variables, we should bind them to whatever those names are in the current context.

With dynamic scoping we can define do_something, then define printf and important_variable, then call do_something. The free variables will be bound and everything will be good. We can call do_something over and over and we can also define important_variable and printf over and over. There’s no point redefining do_something, although we can if we want. Each call will traverse up the let blocks until it finds do_something, printf and important_variable, and those values will be used together. For example:

let (do_something = function() {
  printf(important_variable);
}) in {
  let (printf = function(x) {...}) in {
    let (important_variable = 1) in {
      // Prints 1
      do_something();
    }
    let (important_variable = 2) in {
      let (important_variable = 3) in {
        // Prints 3
        do_something();
      }
    }
  }
}

Another scoping rule is “lexical scope”, which dates back to at least ALGOL 60. The rule says that when we define a function with free variables, we should bind them to whatever those names are in the current context.

Note that this is identical to dynamic scope, except it happens when we define a function rather than when we call it. With lexical scope we can define important_variable and printf, then define do_something, then call do_something. The free variables will be bound and everything will be good. We can call do_something over and over, we can define important_variable and printf over and over and we can redefine do_something over and over. Each call will traverse up the let blocks until it finds do_something. We then traverse up the let blocks starting inside the definition of do_something until we find printf and important_variable, and those values are used. For example:

let (printf = function() {...}) in {
  let (important_variable = 1) in {
    let (do_something = function() {
      printf(important_variable);
    }) in {
      // Prints 1
      do_something();
      let (important_variable = 2) in {
        // Prints 1
        do_something();
      }
    }
  }
}

This looks just like our dynamic scope version, except the nesting of the let blocks has swapped around. This retains the global-like ability to access variables without having to pass them as arguments all of the time, but we have a few advantages over dynamic scope: - The burden of providing a correct context for free variables is placed on whoever’s writing the function, rather than on everyone who ever calls it. - Only one context is ever needed, even if the call sites are in wildly different places. - We can read function definitions top to bottom safe in the knowledge that nothing in the caller’s code can mess it up, unless explicitly passed as an argument. - We can read our application’s code top to bottom safe in the knowledge that nothing in our libraries’ code can mess it up, unless explicitly passed as a return value. - We can use whatever variable names we like without fear of shadowing a free variable by accident.

The last point about shadowing names is the a major advantage, especially when our variable scope gets large. For example, let’s say that one day we are asked to give users of our application the ability to change their usernames. Pretty standard, let’s write some generic CRUD-style code to do it. Here form_username is the username given in the update form and db_username is the username currently in the database:

function(form_username, db_username) {
  // Don't hit the database unnecessarily
  if (form_username == db_username) {
    return true;
  }

  // Don't allow duplicate usernames
  let (existing = db_select('users', 'username', form_username)) in {
    if (count(existing) > 0) {
      return false;
    }
  }

  // OK, go ahead and update
  db_update('users',                   // Table to update
            'username', form_username, // New value
            'username', db_username);  // WHERE condition
  return true;
}

All looks fine (ignoring error conditions), right? Well actually it fails to connect to the database. That’s weird, since we have all kinds of functions that look pretty much the same, for updating contact details and things. Why doesn’t this one work? The database is up, all of the other functions are connecting, why won’t this one?

Eventually, after many hours of debugging, we discover the following function deep in the bowels of the application:

let (execute_query = function(query_string) {
  let (database = connect(db_username, db_password)) in {
    return send_query(database, query_string);
  }
}) in { ... }

When we connect to the database, the username to connect with is taken from the db_username variable. Well, where does that come from? Some more digging around reveals, at the very top of the whole stack:

let (run = function(db_username, db_password) {...}) in {
  if (have_arg("test")) {
    run("tester", "test123");
  } else {
    run("real_user", "pass123");
  }
}

The database details are passed in at the top of the application, so that we can run it in test mode easily. Whenever any part of the application calls execute_query, it gets the value of db_username by traversing up the nested scopes until it gets all of the way up to the top and finds either the live details or the test details. That is, of course, any part of the application except our username function, since it will find our db_username and try to connect to the database server with that! This will clearly be rejected as unauthorised, and hence the error.

Of course this would be even worse if our variables were global, since the whole application would start using incorrect credentials as soon as someone tries changing their username. Hence dynamic scope is better than global scope, but this is still pretty bad. The whole point of a database API is that we don’t care how it works. We shouldn’t be completely ignorant, for example it might be good to know in which situations function foo will be faster than function bar and vice versa, but we shouldn’t have to care about the bloody variable names! In the Programming Language Theory lingo we would say that this breaks “alpha equivalence”, ie. function(x) { blah(x); } should behave the same as function(y) { blah(y); }.

In the case of lexical scope this issue doesn’t arise. We can use exactly the same mechanism to pass db_username into run and access it in execute_query, but the binding will happen when execute_query is defined rather than when we call it. This means that all of the binding has taken place long before our application code ever runs; usually during the application initialisation. Our functions will still have access to db_username, and can shadow it if we like, but the only stuff we can affect is that which we define ourselves.

Now, in a lexically-scoped language variable access is one-way, children can access their parent’s scope but not the other way around. This is, of course, encapsulation, but we can get around it with judicious use of higher-order functions. There’s nothing wrong with code like this:

// Database credentials aren't in scope yet, so we can't define the
// database API. However, we *can* define a function which returns
// an API when given the credentials. We can think of this as a
// 'factory'
let (database_factory = function(db_username, db_password) {
  // Return a database API specialised to the given credentials
  return [function() {
            // ...
         },
         function() {
           // ...
         },
         ...
         ];
}) in {
  // Some arbitrary amount of nesting ...
  // Now we have some credentials, so we can make a database API
  // with them
  let ([db_select, execute_query, ...] = database_factory(username, password)) in {
    // Any code here has access to the API
  }
}

In fact, this is a more flexible approach than passing in the credentials at the top, since we can now use many different versions of the API in different parts of the codebase. If we want to use the same API in different parts that’s fine, we can pass the functions around as arguments/return values. This is similar to parameterised module systems, but less structured.