April 29, 2013

So Long, and Thanks for All the Fish

A notable transition point in my life, by any measure. Interesting things are happening in all directions, many of them (I am delighted to day) heavily involving Python. So this is definitely not goodbye. My time on the Foundation's board has been a wonderful and interesting ride. I hope I have lived up to Tim O'Reilly's ethic of creating more value than I extract.

April 28, 2013

Why Tuples?

A frequent question from new and even not-so-new Python programmers is "why does the language have both tuples (which, if you know Python, you will recall are immutable) and lists?" You might almost say there are two kinds of Python programmers, those who know what tuples are for and those whose mathematical education has been limited. I know this sounds like an awfully snobbish thing to say, but it isn't meant that way. The fact is that I learned most of my programming in eight years of working experience before I started my degree studies, and I am therefore of the very definite opinion that people with little mathematical background can be excellent programmers.

It's just that I myself only came across the word tuple when I started studying college-level mathematics and had to come to terms with things like
An NFA is represented formally by a 5-tuple, (Q, Σ, Δ, q0F), consisting of
  • a finite set of states Q
  • a finite set of input symbols Σ
  • a transition relation Δ : Q × Σ → P(Q).
  • an initial (or start) state q0 ∈ Q
  • a set of states F distinguished as accepting (or finalstates F ⊆ Q.
Now this has a very definite meaning. It tells us that an (ordered - i.e. positionally-identified) set of five things is sufficient to define the behavior of a specific type of object (in this case a non-deterministic finite state automaton (NFA), though for all you need to know about those I might just as well be describing a flux capacitor).

If we wanted to encapsulate an NFA as a Python data structure, then we might at some point in our code write in our Python code something like

    nfa = (states, symbols, transition, initial, accepting_states)

though in actual fact you would be more likely to want to incorporate behavior in a class and so write instead

    nfa = NFA(states, symbols, transition, initial, accepting_states)

Now even though you may not know what an NFA is, you will surely perceive that the set of its possible states is a very different thing from the function that determines how a current state and a set of inputs are mapped into new states. So there is simply no way that it would be meaningful to write

    for thing in (states, symbols, transition, initial, accepting_states):
        do_something_with(thing)

unless you were, for example, trying to save its state during a pickling operation or some more obscure book-keeping metacode.

And that, best beloved, is what tuples are for: they are ordered collections of objects, and each of the objects has, according to its position, a specific meaning (sometimes referred to as its semantics). If no behaviors are required then a tuple is "about the simplest thing that could work."

This is why you often hear people informally say tuples are for "collections of things you don't need to iterate over" or "tuples are for sequences of dissimilar objects". My advice would be to stay away from such discussions. One might reasonably argue that including them in the language discouraged people from using objects with named attributes, which are always easier to deal with.

The problem with the tuple is that once we have constructed our NFA the only way to refer to the states in code is with the rather unedifying expression

    nfa[0]

which doesn't actually tell the reader much about the programmer's intention. The sequence nature of the tuple means that our accesses to the elements are difficult to imbue with meaning in our code. This latterly became such an obvious shortcoming that it prompted Raymond Hettinger to create the namedtuple object that allows you to easily specify a tuple whose elements can also be referred to by name.

It would be interesting to see whether users of namedtuple objects actually use them as tuples at all. I would guess that the sequence behaviors are rarely used, in which case perhaps it's time to either remove namedtuple's __getitem__() and similar methods or implement a similar object without the sequence behaviors.