April 12, 2014

Intermediate Python: An Open Source Documentation Project

There is a huge demand for Python training materials, and there are many people who just don't have the spare cash to buy books or videos. That's one reason why, in conjunction with a new Intermediate Python video series I have just recorded for O'Reilly Media I am launching a new, open source, documentation project.

My intention in recording the videos was to produce a broad set of materials (the linked O'Reilly page contains a good summary of the contents). I figure that most Python programmers will receive excellent value for money even if they already know 75% of the content. Those to whom more is new will see a correspondingly greater benefit. But I was also keenly aware that many Python learners, particularly those in less-developed economies, would find the price of the videos prohibitive.

With O'Reilly's contractual approval the code that I used in the video modules, in IPython Notebooks, is going up on Github under a Creative Commons license. Some of it already contains markdown annotations among the code, other notebooks have little or no commentary. My intention is that ultimately the content will become more comprehensive than the videos, since I am using the video scripts as a starting point.

I hope that both learner programmers and experienced hands will help me turn it into a resource that groups and individuals all over the world can use to learn more about Python with no fees required. The current repository has to be brought up to date after a rapid spate of editing during the three-day recording session. It should go without saying that viewer input will be very welcome, since the most valuable opinions and information comes from those who have actually tried to use the videos to help them learn.

I hope this will also be a project that sees contributions from documentation professionals (and beginners they can help train), so I will be asking the WriteTheDocs NA team how we can lure some of those bright minds in.

Sadly it's unlikely I will be able to see their talented array of speakers as I will still be recovering from surgery. But a small party one evening or a brunch at the office might be possible. Knowing them it will likely involve sponsorship or beer. Or both. We shall see.

I think it's a worthwhile goal to have free intermediate-level Python sample code available, and I can't think of a better way for a relative beginner to get into an open source project. I also like the idea that two communities can come together over it and learn from each other. Suffice it to say, if there are enough people with a hundred bucks* in their pocket for a six-hour video training I am happy to use part of my share in the profits to support this project to some degree.

[DISCLOSURE: The author will receive a proportion of any profit from the O'Reilly Intermediate Python video series]

* This figure was plucked from the air before publication, and is still a good guideline, though as PyCon opened (Apr 11) a special deal was available on a package of both Jessica McKellar's Introduction to Python and my Intermediate Python.

A Rap @hyatt Customer Service Request

It's 2am and the wireless is down
@Hyatt ... #pycon
That's why my face is wearing a frown
Even though I'm at ... #pycon

I love all these Canadians
And Montreal is cool
But don't you know how not to run a network
fool?

If I were a rapper
Then you'd have to call me Milton
Because frankly I get much better service
@Hilton

I'm a businessman myself
And I know we're hard to please
So kindly please allow me
To put you at your ease

Your people are delightful
And as helpful as the best
I want to help, not diss you
I'm not angry like the rest

The food is amazing
And the bar could be geek heaven
If only you weren't calling
For last orders at eleven

We're virtual and sleepless
So we need your help to live
And most of us are more than glad
To pay for what you give

But imagine you're away from home
And want to call your Mom
The Internet's our family
So you've just dropped a bomb

I've had my ups and downs with Hyatt
Over many years
But never felt before
That it should fall on other's ears

I run conferences, for Pete's sake
And I want to spend my money
If only I could reach someone
And I'm NOT being funny

PyCon is my baby
So I cherish it somewhat
But this has harshed my mellow
And just not helped a lot

We're bunch of simple geeks
Who get together every year
We aren't demanding, I don't think
Our simple needs are clear

I don't believe that I could run
Your enterprise right here
It's difficult, and operations
Aren't my thing I fear

So please, don't take this badly
But you've really disappointed
Which is why a kindly soul like me
Has made remarks so pointed

We will help you if we can
We know you pay a lot for bits
But I have to know if web sites
Are receiving any hits

You've cut me off, I'm blind
And so I hope there's nothing funky
Happening to my servers
While I'm sat here getting skunky

Enough, I've made my point
So I must stop before I'm rude
The Internet's my meat and drink
You've left me without food.

trying-to-help-while-disappointed-ly yr's  - steve

March 20, 2014

Social Media and Immortality

I suppose everyone who uses social networks like Facebook eventually comes up against the situation when that network presents a dead friend in some context that only makes sense about someone who is alive and actively maintaining a social media presence. I just did, again.

In this particular case it was triggered by the suggestion from LinkedIn that I might like to add a fellow Learning Tree instructor to my roster of friends. He died, quite young and to most of his colleagues' surprise, about fifteen months ago (if my memory can be relied upon, which I wouldn't necessarily recommend as a strategy). I've seen similar reports on Twitter from other friends.

Now, I'm just a guy who chose to eschew the corporate career ladder and work on small systems that do demonstrable good, so I freely admit that the young devops turks of today are able to develop far more capable systems that I could have conceived of at their age. That's just the nature of technological progress. At the same time, I have to wonder why nobody appears to have asked the question "Should we take special actions (or at least avoid taking regular ones) for users who haven't logged in in over a year?"

Do they have no business analysts? Must we geeks be responsible for avoiding even the most predictable social gaffes?

Sidebar: I once designed the database for a system that monitored the repayment of student support funds by those who had accepted assistance from the federal government to train in teaching disadvantaged students. There were certain valid reasons for deferring repayment (such as military service), and of course these deferrals had to be recorded. I remember feeling very satisfied that all I had to do was associate the null value with the deferral duration for "Dead" as the referral reason to have everything work perfectly well.

The answer to my question of two paragraphs above, by the way, would be “yes”. This will be the last time I give free advice to the social media companies, so Twitter, Facebook, LinkedIn, and the rest, I hope you can find some benefit in this advice. Anything further will cost you dearly. (I should be so lucky).

Quite separately from the above speculations on human frailty, I can't help wondering what kind of immortality a continued existence on these platforms represents (even though this will probably lead to hate mail from all kinds of people the concept offends). I had an email from Google a couple of days ago asking me to log in with a particular identity* within a month or have the account go inactive. That's a necessary second step to whatever palliative actions you choose to take when presenting the account to others. Google, for all their execrable support,** get that you have to log in now and again just to assert your continued existence.

It strikes me this is a reasonably humane way to proceed. If you want to keep someone's memory alive on a social media platform then you must know them at least well enough to log in to their account, after which it's basically your shrine to them if you want it to be. I really don't like to think about what kind of complications the lawyers will dream up about this, though. Otherwise, well, we are after all all born to die (Ray Kurzweil notwithstanding).


*Note to the Google identity nazis: no, of course I was joking, I only have one identity
** Hint re Google customer service: if you aren't paying you aren't a customer, so expecting service might seem presumptuous



January 9, 2014

Practical Python (1)

Note: this blog post is the first I am undertaking with the IPython Notebook. I am still playing with formatting and so on, so please bear with me if the content doesn't seem as easy to read as it should. The notebook itself can be found as a gist file on Github and you can alternatively view it using the online Notebook viewer.

I want to discuss a typical bit of Python, taken from a program sent me by a colleague (whether it's his code or someone else's I don't know, and it hardly matters). It's the kind of stuff we all do every day in Python, and despite the Zen of Python's advice that “there should be one, and preferably only one, obvious way to do it” there are many choices one could make that can impact the structure of the code.

This started out as a way to make the code more readable (I suspect it may have been written by somebody more accustomed to a language like C), but I thought it might be interesting to look at some timings as well.

In order to be able to run the code without providing various dependencies I have taken the liberty of defining a dummy Button function and various other “mock” objects to allow the code to run (they implement just enough to avoid exceptions being raised)*. This in turn means we can use IPython's %%timeit cell magic to determine whether my “improvements” actually help the execution speed.

Note that each timed cell is preceded by a garbage collection to try as far as possible to run the samples on a level playing field**.

In [1]:
import gc

class MockFrame():
    def grid(self, row, column, sticky):
        pass
mock_frame = MockFrame()

def Button(frame, text=None, fg=None, width=None, command=None, column=None, sticky=None):
    return mock_frame

class Mock():
    pass

self = Mock()
self.buttonRed, self.buttonBlue, self.buttonGreen, self.buttonBlack, self.buttonOpen = (None, )*5

f4 = Mock()
f4.columnconfigure = lambda c, weight: None

ALL = Mock()

The code in this next cell is extracted from the original code to avoid repetition - all loop implementations are written to use the same data.

In [2]:
button = ["Red", "Blue", "Green", "Black", "Open"]  
color = ["red", "blue", "green", "black", "black"]  
commands = [self.buttonRed, self.buttonBlue, self.buttonGreen,
            self.buttonBlack, self.buttonOpen]  

So here's the original piece of code:

In [3]:
g = gc.collect()
In [4]:
%%timeit
# Benchmark 1, the original code
for c in range(5):  
    f4.columnconfigure(c, weight=1)
    Button(f4, text=button[c], fg=color[c], width=5,
               command=commands[c]).grid(row=0, column=c, sticky=ALL)
100000 loops, best of 3: 4.45 µs per loop

You might suspect, as I did, that there are better ways to perform this loop.

The most obvious is simply to create a single list to iterate over, using unpacking assignment in the for loop to assign the individual elements to local variables. This certainly renders the loop body a little more readably. We do still need the column number, so we can use the enumerate() function to provide it.

In [5]:
g = gc.collect()
In [6]:
%%timeit
for c, (btn, col, cmd) in enumerate(zip(button, color, commands)):  
    f4.columnconfigure(c, weight=1)
    Button(f4, text=btn, fg=col, width=5, command=cmd). \
               grid(row=0, column=c, sticky=ALL)
    pass
100000 loops, best of 3: 4.26 µs per loop

Unfortunately any speed advantage appears insignificant. These timings aren't very repeatable under the conditions I have run them, so really any difference is lost in the noise - what you see depends on the results when this notebook was run (and therefore also on which computer), and it would be unwise of me to make any predictions about the conditions under which you read it.

We can avoid the use of enumerate() by maintaining a loop counter, but from an esthetic point of view this is almost as bad (some would say worse) than iterating over the range of indices. In CPython it usually just comes out ahead, but at the cost of a certain amount of Pythonicity. It therefore makes the program a little less comprehensible.

In [7]:
g = gc.collect()
In [8]:
%%timeit
c = 0
for (btn, col, cmd) in zip(button, color, commands):  
    f4.columnconfigure(c, weight=1)
    Button(f4, text=btn, fg=col, width=5, command=cmd). \
               grid(row=0, column=c, sticky=ALL)
    c += 1
    pass
100000 loops, best of 3: 4.05 µs per loop

The next two cells repeat the same timings without the loop body, and this merely emphasises the speed gain of ditching the call to enumerate(). At this level of simplicity, though, it's difficult to tell how much optimization is taking place since the loop content is effectively null. I suspect PyPy would optimize this code out of existence. Who knows what CPython is actually measuring here.

In [9]:
g = gc.collect()
In [10]:
%%timeit
for c, (btn, col, cmd) in enumerate(zip(button, color, commands)):  
    pass
1000000 loops, best of 3: 1.18 µs per loop

In [11]:
g = gc.collect()
In [12]:
%%timeit
c = 0
for btn, col, cmd in zip(button, color, commands):
    pass
    c += 1
1000000 loops, best of 3: 854 ns per loop

Somewhat irritatingly, manual maintenance of an index variable appears to have a predictable slight edge over use of enumerate(), and naive programmers might therefore rush to convert all their code to this paradigm. Before they do so, though, they should consider that code's environment. In this particular example the whole piece of code is simply setup, executed once only at the start of the program execution as a GUI is being created. Optimization at this level woud not therefore be a sensible step: to optimize you should look first at the code inside the most deeply-nested and oft-executed loops.

If the timed code were to be executed billions of times inside two levels of nesting then one might, in production, consider using such an optimization if (and hopefully only if) there were a real need to extract every last ounce of speed from the hardware. In this case, since the program uses a graphical user interface and so user delays will use orders of magnitude more time than actual computing, it would be unwise to reduce the readability of the code, for which reason I prefer the enumerate()-based solution.

With many loops the body's processing time is likely to dominate in real cases, however, and that again supportus using enumerate(). If loop overhead accounts for 5% of each iteration and you reduce your loop control time by 30% you are still only reducing your total loop run time by 1.5%. So keep your program readable and Pythonically idiomatic.

Besides which, who knows, some Python dev might come along and change implementations to alter the relative time advantage, and then wouldn't you feel silly changing all that code back again?

* If you have a serious need for mock objects in testing, you really should look at the mock module, part of the standard library since Python 3.3. Thanks to Michael Foord for his valiant efforts. Please help him by not using mock in production.

** An interesting issue here. Originally I wrote the above code to create a new MockFrame object for each call to Button(), and I consistently saw the result of the second test as three orders of magnitude slower than the first (i.e. ms, not µs). It took me a while to understand why timeit was running so many iterations for such a long test, adding further to the elapsed time. It turned out the second test was paying the price of collecting the garbage from the first, and that without garbage collections in between runs the GC overhead would distort the timings.

January 3, 2014

Blip.tv Deletes Python Content

There's been some disturbance in the Python ecosphere because Blip.tv has removed a lot of Python content - for a long time, Next Day Video used Blip as their preferred hosting service (I don't know whether they still do or not, but after this I should hope not) and PyCon video was hosted there by default. According to their announcement
After many years of being an open platform, we’re now taking our mission to bring the best original web series to our audience more seriously.
So I wrote to them to ask if it would be possible to get copies of the content they had made unavailable:
I understand that many Python-related videos are no longer available on your service.
This is to ask whether you can make the original media available to us for re-hosting, since there is definite demand for some of the video content you have removed.
Their reply was unequivocal:
Unfortunately this content is no longer available on Blip and we are not able to provide it to you. The original content owner may re-upload their source files to another hosting provider as an alternative.
So if you were thinking about using Blip's services, you might want to think again. They have just deleted all this stuff, as far as I can see without giving much notice to the people who posted it in the first place. They have made no friends in the Python world as a result, and I can only imagine who else they have pissed off with their apparently heavy-handed actions.

Open Source and Money

David Heinemeier Hansson, a name well-known in the Ruby on Rails world, recently wrote a blog post entitled “The perils of mixing open source and money.” In it he argues that the ability of open source contributors to raise funding through sources like Kickstarter threatens open source, though I find his arguments unpersuasive.

First he suggests that fundraising represents a one-time “cashing in on goodwill earned,” whereas I suspect that if a funded project is successful that would increase the likelihood of receiving funding for future projects and wold actually increase the goodwill directed towards the fundraiser. Second he indirectly suggests that being compensated for writing software will lead to needless embellishment, whereas I should have thought that community pressures in any decent open source developer community would lead to negative code reviews and decisions not to include needless bloat.

Hansson then goes on to suggest that working for community donations causes people to work to keep the donations coming in rather than to improve the software. The fact that many Kickstarter software projects have apparently succeeded appears to make no difference to his opinion. Sadly it seems to me that in closing he reveals that the whole piece is indeed just opinion when he says
It's against this fantastic success of social norms that we should be extraordinary careful before we let market norms corrupt the ecosystem. Like a coral reef, it's more sensitive than you think, and it's how to underestimate the beauty that's unwittingly at stake. Please tread with care.
 Doesn't Hansson know that many people who work in open source do so principally because their employers pay them to do so? And yet their acceptance of the corporate shilling is apparently not in danger of perverting the course of open source development, while people with good ideas that others are prepared to fund apparently don't qualify to receive support because they put the whole ecosystem in danger.

Either I misunderstood something or what Hansson wrote doesn't make sense. The fact of the matter is that the best open source projects don't include contributions because they have been funded, they include them because they are valuable to the project. As long as these values remain in place then the injection of money into open source projects is both desirable and useful.