October 19, 2017

What's In a Namespace?

Python programmers talk about namespaces a lot. The Zen of Python* ends with
Namespaces are one honking great idea—let’s do more of those!
and if Tim Peters thinks namespaces are such a good idea, who am I to disagree?

Resolution of Unqualified Names

Python programmers learned at their mothers' knees that Python looks up unqualified names in three namespaces—first, the local namespace of the currently-executing function or method; second, the global namespace of the module containing the executing code; third and last, the built-in namespace that holds the built-in functions and exceptions. So, it makes sense to understand the various namespaces that the interpreter can use. Note that when we talk about name resolution we are talking about how a value is associated with an unadorned name in the code.

In the main module of a running program there is no local namespace. A name must be present in either the module's global namespace or, if not there, in the built-in namespace that holds functions like len, the standard exceptions, and so on. In other words, when __name__ == '__main__' the local and global namespaces are the same.

When the interpreter compiles a function it keeps track of names which are bound inside the function body (this includes the parameters, which are established in the local namespace before execution begins) and aren't declared as either global or (in Python 3) nonlocal.  Because it knows the local names the interpreter can assign them a pre-defined place in the stack frame (where local data is kept for each function call), and does not generally need to perform a lookup. This is the main reason local access is faster than global access.

Although the interpreter identifies local names by the presence of bindings within a function body, there is nothing to stop you writing code that references the names before they are bound. Under those circumstances you will see an UnboundLocalError exception raised with a message like "local variable 'b' referenced before assignment".

For non-local names, something very like a dictionary lookup takes place first in the module's global namespace and then in the built-ins. If neither search yields a result then the interpreter raises a NameError exception with a message like "name 'nosuch' is not defined."

Resolution of Qualified Names

In qualified names (those consisting of a sequence of names or expressions delimited by dots  such as os.path.join) starts by locating the first object's namespace (in this case os) in the standard way described above. Thereafter the mechanism can get complex because like many Python features you can control how it works for your own objects by defining __getattr__ and/or __getattribute__ methods, and because descriptors (primarily used in accessing properties) can cloud the picture.

In essence, though, the mechanism is that the interpreter, having located the object bound to the unqualified name, then makes a gettatr call for the second name (in this case, path) in that namespace, yielding another object, against which a further getattr call is made with the third component of the name, and so on. If at any point a getattr fails then the interpreter raises an AttributeError exception with a message such as "'module' object has no attribute 'name'."

Understanding Expression Values

Once you understand the mechanisms for looking up the values of names it becomes a little easier to understand how Python computes expression values. Once a name is resolved there may be other methods to apply such as __getitem__ for subscripting or __call__ for function calls. These operations also yield values, whose namespaces can again be used to lookup further names. So, for example, when you see an expression like

    e.orig.args[0].startswith('UNIQUE constraint failed')

you understand that the name e.orig.args is looked up by going through a sequence of namespaces and evaluates to a list object, to which a subscripting operation is applied to get the first element, in whose namespace the name startswith is resolved (hopefully to something callable) to a value that is finally called with a string argument.

Ultimately, by decomposing the expressions in this way you end up only dealing with one object at a time. Knowing how these mechanisms work in principle can help you to decipher complex Python code.

* Just type import this into a Python interpreter, or enter python -m this at the shell prompt, and hit return.