May 31, 2009

PyCon is Popular!

Regular readers will know that this year's PyCon went extremely well. Not only was it excellent value for money, but we finally got our video recording act together, to the point where most sessions' video was available on less than a week after the end of the conference.

I just learned that there have already been over 180,000 views of the material since it was uploaded roughly two months ago, which works out at around 3,000 per day. It just goes to show that there is a lot of demand for material about Python!

May 29, 2009

Where Next for PyCon?

PyCon 2010 will be in Atlanta, as I wrote a few weeks ago. No decision has been made about future venues, but given yesterday's release of the PyCon 2009 delegate questionnaire response data I thought it might be instructive to use Python to try and get a handle on public opinion about future locations. Question 13 asked "Where would you like to see PyCon 2010 or a future PyCon? Enter up to 3 cities or regions."

I copied the data into a spreadsheet for ease of manipulation, and did some munging to standardize presentation. I may have taken a few liberties here (such as replacing "California, Bay Area" with "San Francisco"), but I did so in an honest attempt to make sure that every important datum was counted. If the conference were to go to "San Francisco" the organizers would be very foolish to overlook other potential Bay Area locations, for example.

The Data
You can find the raw data I ended up with at if you want to play about with them yourself. Here's a small section to give you the flavor:
New Jersey      Palm Springs    "Portland, OR"
New York "Portland, OR" "Portland, OR"
New York "Portland, OR" "Portland, OR"
New York "Portland, OR" San Diego
Northeast "Portland, OR" San Diego
Nowhere specific "Portland, OR" San Francisco
Phoenix Saint Paul San Francisco
"Portland, OR" San Francisco Seattle
It's simple tab-separated data, with gratuitous quote marks from Excel. The presentation makes it look like everyone made first second and third choices that were suspiciously close together in the alphabet, but that's only because I sorted each column independently to make the munging easier. I should probably have used the csv module to read the data, but foolishly chose to do it myself. Just the same, there isn't a lot of code.

The Program
# Process PyCon feedback about future venues
# NOTE: Food for thought only ...
pd = {}
f = open("wherenext.txt")
for line in f:
places = [p.strip('"') for p in line.strip().split("\t")]
for rank, where in enumerate(places):
if not where in pd:
pd[where] = [0, 0, 0]
if where:
pd[where][rank] += 1

places = []
for where, [w1, w2, w3] in pd.items():
places.append((3*w1+2*w2+w3, where, [w1, w2, w3]))
for rank, where, scores in sorted(places):
print where, rank, scores
The first for loop simply iterates over the file. Each place's position in the line is the weighting that a voter gave it, so after splitting the line at each tab and removing any leading or trailing quotes I then use enumerate() to generate (rank, placename) pairs for all elements in the line. Some lines have only one or two entries because not everyone made three choices, but the formulation I used copes fine with that.

The placename of each pair is used to index a dict of [first, second, third] counts which counts the number of times a specific place was ranked in each position.

The next for loop is my favorite piece of this little program. It generates a new list of (weighted_score, placename, raw_scores) tuples ready to be sorted. The weightings I used (3 for a first choice, 2 for a second and 1 for a third) were completely arbitrary, so feel freee to change them in an attemto to fudge the results for your favored location.

Why did I like this particular loop so much? Since each item in the dict is a (placename, raw_scores) tuple I unpack the elements right in the for statement. It's such little elegances that endear Python to its fans: you can always do things in a straightforward way if you want, but as long as it doesn't interfere with readability I usually take advantage of such abbreviations.

I felt that a list comprehension would have been just a little bit too difficult to read, but you could easily replace the loop with one, and for further illegibility you could put it directly as the argument to sorted().

Finally I print out the results, with the lowest scoring first. Yes, I could have used a reverse() call to print out the favorite first, but then I'd have had to scroll the window back to know which city had "won". I could have formatted them better, too. Feel free, knock yourself out. I try to avoid over-specifying presentation when it's really the data I am interested in.

The Results
Given that the 2009 conference was held in Chicago I was unsurprised to see a strong vote for it, presumably by locals who might not be able to get to Atlanta, and certainly wouldn't find it as convenient. The only surprise was that it came in second rather than first! Portland, OR made a very strong showing, scoring only two points less, closely followed by Seattle with New York and Washington DC trailing rather further behind, But the clear winner was San Francisco, by coincidence the other location to make a strong bid for 2010 with Atlanta. Here are the results in their entirety.

Please remember that the data is from a self-selected group, and that this program is not binding on the PyCon organizers! Despite my best efforts I must have left a blank line in the data, so the empty string is recorded as having no scores! The if where condition should really have guarded the whole loop body, but it was an afterthought, and serves to remind us that sloppy design will lead to sub-optimal results.

0 [0, 0, 0]
Bay Area, CA 1 [0, 0, 1]
Cleveland 1 [0, 0, 1]
Houston, Texas 1 [0, 0, 1]
Knoxville, TN 1 [0, 0, 1]
Midwest 1 [0, 0, 1]
Oregon 1 [0, 0, 1]
Orlando, FL 1 [0, 0, 1]
Phoenix, Az 1 [0, 0, 1]
Pittsburgh, PA 1 [0, 0, 1]
Portland, ME 1 [0, 0, 1]
Twin Cities, MN 1 [0, 0, 1]
Virgin Islands 1 [0, 0, 1]
West coast 1 [0, 0, 1]
not next to an airport 1 [0, 0, 1]
vancouver 1 [0, 0, 1]
Boston Area 2 [0, 1, 0]
Cleveland, OH 2 [0, 1, 0]
East coast 2 [0, 1, 0]
Fort Collins, CO 2 [0, 1, 0]
Honalulu, HI 2 [0, 1, 0]
Kansas City, MO 2 [0, 1, 0]
Las Vegas, NV 2 [0, 1, 0]
Lawrence, Kansas 2 [0, 1, 0]
Los Angeles, CA 2 [0, 0, 2]
Manhattan 2 [0, 1, 0]
Minnesota 2 [0, 1, 0]
New York, NY 2 [0, 1, 0]
Palm Springs 2 [0, 1, 0]
Saint Paul 2 [0, 1, 0]
San Diego 2 [0, 0, 2]
Seatle 2 [0, 1, 0]
Toronto, ON 2 [0, 1, 0]
Vegas 2 [0, 1, 0]
huntsville, al 2 [0, 1, 0]
new orleans 2 [0, 1, 0]
Detroit, MI 3 [1, 0, 0]
East Coast 3 [1, 0, 0]
Europe 3 [1, 0, 0]
Houston, TX 3 [1, 0, 0]
Huntsville, AL 3 [1, 0, 0]
Kansas City 3 [1, 0, 0]
Madison, WI 3 [1, 0, 0]
Miami 3 [1, 0, 0]
Montreal 3 [1, 0, 0]
New Jersey 3 [1, 0, 0]
Nowhere specific 3 [1, 0, 0]
Phoenix 3 [1, 0, 0]
Raleigh/Durham, NC 3 [1, 0, 0]
Reno, NV 3 [1, 0, 0]
San Diego , CA 3 [1, 0, 0]
San Fransisco 3 [1, 0, 0]
San Jose 3 [1, 0, 0]
St. Louis, MO 3 [1, 0, 0]
Tucson, AZ 3 [1, 0, 0]
london 3 [1, 0, 0]
New Orleans 4 [0, 2, 0]
Colorado 5 [1, 1, 0]
Northeast 5 [1, 1, 0]
Canada 6 [2, 0, 0]
Las Vegas 6 [1, 0, 3]
Somewhere hot 6 [1, 1, 1]
Vancouver 6 [0, 3, 0]
Dallas, TX 7 [1, 2, 0]
Denver 7 [1, 2, 0]
Minneapolis 7 [2, 0, 1]
Toronto 7 [2, 0, 1]
California 9 [3, 0, 0]
Atlanta 10 [2, 2, 0]
Austin, TX 14 [1, 5, 1]
Boston 16 [2, 2, 6]
Washington, DC 16 [4, 2, 0]
New York 19 [3, 4, 2]
Seattle 27 [3, 7, 4]
Portland, OR 35 [5, 5, 10]
Chicago 37 [10, 2, 3]
San Francisco 45 [11, 5, 2]

May 25, 2009

Memorial Day

No matter what you think about the US's current wars (and I think they are an abomination) one can have only the utmost respect for those who choose to serve their country in the armed forces. This post is published as an expression of my admiration for the country's servicemen and women, with sympathy and deepest condolences to all families whose loved ones have been lost in action.

May 22, 2009

EuroPython Booked

So the flights and hotel are booked and it's confirmed: I am going to EuroPython. This will only be my second attendance, and since it's in England this year close to where my one of my sisters lives I expect to feel reasonably at home. It won't be a long trip, but John Pinner tells me he'd like me to give a talk about the PSF and an after-dinner speech, and that "we may like you to join a panel or two, Open Space etc as well if that's OK".

Since the conference is paying my travel and accommodation it would seem ungracious to refuse. So it looks like I'll be busy. I am already looking forward to it!

One of the nice parts of visiting the Midlands will be a chance to sample some British beers. If you have any recommendations or suggestions please be sure to let me know!

May 21, 2009

Trying Again

I believe my typesetting confusion of recent days was due not, as I had feared, to my advancing years and inability to solve technical problems, but rather to a discrepancy between Blogger's preview and the final display of the blog, plus the absence of div tags.

The output of the program in Blogging Python Output: A Challenge should have displayed as
<__main__.MyCls object at 0x024217F0>
Let's see if this survives being saved and published, as so few have before it.

Factory Functions

In the last entry we discussed passing callables (using functions and classes) as arguments to functions, and calling them within the body of the function. This time we'll look at how functions can return more esoteric objects - again including functions and classes.

Beginners sometimes ask how a function can be made to return "more than one object". The strict answer is that it can't. The single object it returns can be a container, though, allowing several values to be extracted from the returned object. If you want a function to return three values the easiest way to arrange this is to have it return a three-element tuple, and then extract the individual values using an unpacking assignment. Here's a simple example.
def powers(a):
return a, a*a, a*a*a

a, square, cube = powers(10)
print(a, square, cube)
This prints 10 100 1000, showing that the three returned values have indeed been assigned to individual variables. Functions can return more complex objects than simple containers, though. A frequent example in the programming literature is a function that returns some newly-created function each time it is called. Typically the function returned will vary according to one or more of the arguments passed to the call that creates it.
def make_fun(power, debug=False):
def pow(x):
result = x ** power
if debug:
print("power(%s, %s) returned" % (x, power), result)
return result
return pow

squarer = make_fun(2)
cuber = make_fun(3, True)

print([f(9) for f in (squarer, cuber)])
A call to make_fun() results in a function being defined as make_fun's function body is being executed. This function (whose local name is pow) contains references to the arguments passed to make_fun, and is returned by the call to make_fun to be assigned and eventually called. The calls to the function are inside a list comprehension just because that's the easiest way to call a set of functions with the same argument(s).

The output from this is
power(9, 3) returned 729
[81, 729]
The debugging output from cuber is seen before the list comprehension because all calls have to return their values before the list comprehension is complete and ready for printing.

You may be familiar with the concept of a mixin class. Such classes are designed to take advantage of Python's multiple inheritance features to add functionality to any chosen classes, by creating a new class which is a subclass of both the mixin and the chosen class. You can see an exellent example of this in the socket library, where a ThreadingMixIn class is defined and can be used to extend the features of the basic UDPServer class like this:
class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
Note that the newly-declared ThreadingUDPServer class doesn't specify any behavior of its own, it merely inherits - first from the mixin and then from the base class, meaning that methods defined in the mixin take precedence over those defined in the base server class. One problem with this, however, is that it doesn't allow for any variation in the mixin classes - by the time you use them they are already created, and it's too late.

In the same way that we can parameterize functions, however, we can parameterize classes as well. Suppose we want to provide a trivial mixin to print the class's name in either upper- or lower-case. Not very inspiring, but the simplest example I could think of to get the point over, so please bear with me if you can think of simpler ways to do this. One possibility is this.
def mixin(cls, lcase=True):
class Mixin:
def nprint(self):
if lcase:
class Result(Mixin, cls):
return Result

class FirstClass:

Cl1 = mixin(FirstClass, True)
cl1 = Cl1()

mixin(FirstClass, False)().nprint()

Here the function first defines a mixin class, then creates a new class from the mixin and the base class provided as an argument. The interpreter cares not at all where the class definitions come from - classes are first-class objects just like functions and strings, and can just as easily be passed as function arguments as obtained any other way.

The output from the program, which I am sure you are waiting for with bated breath, is
<class '__main__.firstclass'>
which shows that the program runs, and that the mixin class's behavior is conditioned by the function's second argument.

Now you might choose to argue that this isn't a very natural example, and I'd be inclined to agree with you. All I have to say besides that is, you try coming up with these examples and see how you like it. If anyone chooses to contribute a more natural example that can be expressed without too much extra code I'll be happy to write about it.

May 20, 2009

Blogging Python Output: A Challenge

I spent half an hour yesterday in a battle with Bogger, trying to get it to render the output of this Python 3 program:
class MyCls:

obj = MyCls()
Unfortunately, despite batting to with HTML entities for less than and the like, and even changing all spaces to non-breaking spaces*, the best I seem to be able to do is:


It starts out looking OK, but once I preview it, or publish it, Blogger just throws away everything after the first space up to the closing angle. Who can tell me what I am doing wrong? Or is Blogger being unfair.

* don't even think about blogging the actual entity codes: they'll be mangled too.

[The program will run, producing slightly different output, under Python 2. I suspect Blogger will mangle that output too]

Everything's an Object

"Everything's an object" is a truism for truly object-oriented languages, and Python is truly object-oriented. You may meet some purists who try to tell you that only [their favorite language] can really be called object-oriented, because Python doesn't have this feature or that feature, but don't take any notice of them. Religious zealots are everywhere, and their only interest is to convert you to their faith. In the Python world we don't tend to hold with religious zeal, and much prefer irreverent comedy sketches and making fun of things. Particularly religious zealots.

Newcomers to the language are sometimes surprised to find that you can pass all kinds of things as arguments to functions, and use them quite naturally inside the functions. The classic example is functions themselves. Let's write a program containing an innocuous little function that takes a function as its first argument and returns the result of calling the function on its second and third arguments.
def caller(f, a, b):
return f(a, b)

def adder(x, y):
return x+y

print(caller(adder, "abc", "def"))
print(caller(adder, 123, 456))

class MyCls:
def __init__(self, first, second):
self.first = first
self.second = second
def method(self):
return self.first*self.second

obj = caller(MyCls, "=", 10)
print(obj, obj.method())
The adder function adds two arguments, and pass that as the first argument to caller a couple of times. If all has gone to plan the first two lines of output will look like this:
The first argument to adder isn't constrained to be a function. It just has to be something that can be called.We can take advantage of the fact that a call to a class creates an instance of the class to have caller create the instance for us. The MyCls class has an __init__() method that takes two arguments (as well as the ubiquitous self that the zealots will try and persuade you isn't necessary).

The call to caller returns an instance of MyCls. When that instance's method() is called it uses the remembered arguments to print out a string of ten equals signs:


The generality that this demonstrates can be difficult to get used to if you haven't come across anything like it before. Once you appreciate it, though, it gives you a flexibility that is hard to match in many other languages. That's one of the reasons why the Python world often talks about "callables": we don't care whether it's a function or a class, we only care that it can be called.

This is one aspect of polymorphism, one of the foundations of object-oriented programming.

[The code will run under Python 2, but the output shown here was produce by Python 3. The output from Python 2 will differ slightly].

May 19, 2009

Who's Up for a Sprint, Then?

It's now a good long time since Need for Speed (three years, I find, rather to my surprise), and I have settled back into a more or less regular routine after my move back to the USA. So I am starting to wonder whether it might not be time for another speed-focused sprint, and this post represents a first attempt to run the idea up the flagpole and see if anyone salutes.

There have been many interesting developments on the VM front in the intervening years, and it seems like there's a distinct possibility that the current speed limits are going to be history in a year's time. That being the case, I would be prepared to put some effort into getting sponsorship and doing the administration and organization.

I was wondering about Amsterdam as a venue. Given the required lead times to recruit sponsors, get sprinters on board and ensure adequate accommodation I am thinking that the best time might betowards the end of the year - say October or November.

Who thinks this could be helpful? Who'd like to join in and sprint? Who would contribute funding to make it happen? All these questions are important, and only you can answer them!

Is It Installed or Not?

[Notes for a while back, as can be seen by the version numbering]

Hmm. What could be the problem here?

sholden@bigboy ~
$ easy_install-2.5 SQLAlchemy
Searching for SQLAlchemy
Best match: sqlalchemy 0.4.2dev-r3811
Processing sqlalchemy-0.4.2dev_r3811-py2.5.egg
sqlalchemy 0.4.2dev-r3811 is already the active version in easy-install.pth

Using c:\python25\lib\site-packages\sqlalchemy-0.4.2dev_r3811-py2.5.egg
Processing dependencies for SQLAlchemy
Finished processing dependencies for SQLAlchemy

sholden@bigboy ~
$ python
Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlalchemy
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named sqlalchemy
Simple, once realized: the easy_install I was running was based on Python for Windows, so running it under Cygwin still reported the state of the Windows installation! That's one slight down-side to Cygwin picking up so mcuh from the Windows side of things: sometimes it gets holds of things it shouldn't.

May 18, 2009

Simple Object Interactions

Here's a relatively simple piece of code that shows how easy it is to have Python objects interact. It's fairly standard stuff, so it probably won't be interesting to old hands, but if you haven't been using object-oriented programming very long then it might make a point or two.

Note that the code is in Python 3, though it will actually run on earlier versions, the printing will just be slightly different.
# Calling all reactive agents (with apologies to William Burroughs)
class Simulation:

def __init__(self, howmany):
self.agents = []
for i in range(howmany):
self.agents.append(Agent(self, i))

def showImportant(self, agent):
return "Agent: %d simulation: %d" % (agent.number, id(self))

def listAgents(self):
for a in self.agents:

class Agent:

def __init__(self, sim, number):
self.sim = sim
self.number = number

def showMe(self):
print("Agent", self.number, "reporting:")
result = self.sim.showImportant(self)

s = Simulation(3)
So, we start out with a class whose __init__() method is called with one argument, the number of agents to create. It creates a list of that many agents as its agents attribute. Note that when the Agent is created the Simulation instance passes itself as an argument to the Agent creator, and the Agent.__init__() method saves the reference to the Simulation instance as the Agent's sim instance variable.

Colloquially we could say that each Agent "knows" which Simulation it's a part of. So an Agent is able to call the methods of the simulation that it's a part of. This makes it possible to devise agents that call various simulation methods, and to incorporate them in several different simulations which implement those methods differently.

When building large data structures with this kind of pattern, by the way, it's important to note that the references are circular (simulations refer to agents, and agents refer to simulations). The garbage collector in older versions of Python would have real difficulties with structures such as the ones created in the code above. In essence it would say to itself "Well, I can't delete the simulation until I have deleted all the agents it refers to." Then it would look at all the agents, and for each one it would say to itself "I can't delete this until I have deleted the simulation it refers to". An unintelligent garbage collection might get stuck in an infinite loop here, but Python's garbage collector has never been that stupid.

Nowadays, I am happy to say, in Python 2 or 3 the collector is able to recognize these cyclic references and, as long as there are no references to such a structure from the outside, will (eventually) reclaim the space.

Next time we'll consider how a simulation might use several types of agents, still providing common behavior for them all.

May 17, 2009

Python is So Versatile

Even though I use Python all the time, and I am well-connected in the Python world, so much is happening that it's difficult to stay in touch with everything Python. Taking a look at the Python Package Index, in just the last 24 hours there have been over 40 submissions. Not all of them will be new releases, but it's instructive to take a look at what's come in to PyPI as an indication of the breadth of applications in which Python is used. Rather than list all 40 submissions available from the RSS feed, let's just take a look at some of the more interesting items.

You could use the new release of Dumbo to write programs in Hadoop, the language for specifying massively parallel map-reduce computations. With AMFast you could create rich Internet applications using Flash remoting. If you wanted to write a Unix daemon in Python you could use the python-daemon library that implements PEP 3143. You could write a network time client with ntplib.

If you wanted to add automated image processing to your Django web application you could do it with django-imagekit, or you could add AJAX with Dajax. To make it useful to a broader audience you could add django-bidi-utils to handle bi-directional text.

You might want to analyze some PDF files, in which case you could use pdfminer. If you are more interested in structural bioinformatics research then Biskit might be more your cup of tea. With all this complex software flying around you might need a program to handle a structured to-do list, and Task Coach would be ready and waiting for you.

There's also a slew of Zope and Plone components to add to these already very capable systems. Python is just so versatile!

May 16, 2009

Python Booth at OSCON

For the first time this year Python will have a booth at OSCON, thanks to the good offices of Aahz. If you'd like to keep the Python banner flying over the exhibition hall I am sure he could do with some help.

What's the point? Well, certainly it will help to raise awareness of the language, which can only be a good thing. Having a booth puts a human face on Python and gives people a chance to get first-hand information rather than being filtered through other people's prejudices. OSCON is probably the largest open source conference in the world, so it's a good place to advertise the best open source language in the world.

The Python Software Foundation will be funding a large banner, and whatever else is required. I am sure Aahz will be happy to have suggestions and (most especially) offers of assistance. I've spent time in the past manning exhibition stands, and while it's interesting to meet a wide range of people it can be surprisingly tiring, so a one-man effort won't be enough. You can subscribe to the Python OSCON mailing list if you are interested in helping with either or both of the planning and execution.

May 14, 2009

Help Files Should Help, Right?

Microsoft Documentation Sucks
God knows I've had my complaints about Microsoft documentation in the past. Often about manuals consisting mostly of descriptions of the following nature:
Threep Nardling
To nardle threeps, select the Threep tab and check the Nardling checkbox.
Frankly this kind of documentation is worse than useless - it elevates statement of the bleeding obvious to new heights, and frustrates all users with at least one eye and half a brain. If you don't know what a threep is, or when it might usefully be nardled, the implication is that you are in the wrong place (though quite where else you would be expected to go for this information escapes me).

Ubuntu/Gnome Documentation Sucks
That said, I hope I am setting myself up for a fall here. I've just installed a number of Ubuntu 8.04 virtuals (because I want to be compatible with a client environment, since you ask), and I have been having problems getting the network interfaces to behave. So I go to the help file for the GUI-based networking tool so kindly provided, and the main portion is filled with this sort of idiocy:

The really annoying piece is that I went to the help file to try and get an understanding of the roaming mode, only to discover that this steaming pile of placemarkers* masquerading as documentation contains zero mention of the one interface feature I needed to know about. They could at least have had a section saying "Check this box to put the interface into roaming mode".

I am really hoping that this documentation has improved a lot in the two versions of Ubuntu that have been released since 8.04. If not, then it's time somebody (either at the Gnome Foundation or at Canonical) started to give some serious attention to documentation. Help files that don't help are a major source of end-user frustration.

Somebody, please put me out of my misery and tell me that this nonsense is gone in more recent releases. Otherwise I might just have to go home and bang my head against the wall.

The Real Problem
All of this is merely subsidiary to the real issue, which is how to I get a VirtualBox Ubuntu guest running under a Windows Vista host to track changes in the Vista internet connectivity. It seems like every time I change locations I have to spend time tweaking settings on the virtuals, rebooting uselessly and generally poking things until I get them to work without any clear idea of the eventually successful strategy.

So, dear lazyweb, please help me. If there's a manual that explains this I'd be happy to make a donation to its author. The open source world should, in my less than humbe opinion, value good documentation as much as (or more than) good code. Once you get past the obvious, the docs help you more than the code.

* Admit it, you though I was going to write "dung" there, didn't you? I am trying to eschew the obvious.

May 10, 2009

SCO's Inevitable End Moves Closer

A recent development in SCO's Chapter 11 bankruptcy takes the company still closer to its inevitable final demise. Groklaw reports that the US trustee has filed a motion to convert the bankruptcy proceedings from Chapter 11 (reorganization under protection from creditors) to Chapter 7 (liquidation of the business).

Darl McBride, somewhat predictably, is quoted as saying he was surprised by the decision, and that the company will fight it. This doesn't hide the fact that a bankrupt strategy will lead to a bankrupt company. The current stock price of $0.15 capitalizes the company at just over $3 million. How are the mighty fallen.