November 3, 2006

Stackless Python Continues to Amaze

I recently started tracking the Stackless Python mailing list. I've been interested in Stackless, the brainchild of the fearsomely clever Christian Tismer, for some time now. I recently acquired a client with active interests in that area, so it behooves me to stay up to date.

Stackless allows you to organise your work into tasklets (which as far as I understand it have replaced the original microthreads as the unit for scheduling: this is all pretty new to me, so apologies if it's incorrect). An interesting thing about tasklets is that you can pickle them, pass the pickle to another computer, unpickle them and resume the tasklet in the new environment.

My attention was grabbed by one recent thread in which a user explains that he was trying to perform such interactions across a heterogeneous environment with Power PC, SPARC and Intel-based CPUs and found that it worked between SPARC and PPC, but the Intel architecture appeared to be failing -- presumably due to the different endianness.

After a couple of messages surmising that this was hardly surprising, Christian came into the thread to explain that there was no architecture-dependent code in the pickled thread resumption functionality. Lo and behold, the original poster came back explaining:
I must apologize, it turns out my problem was due to the fact that I was using a different version of stackless on the intel machine. In case anyone is curious, resuming pickled tasklets across architectures is easy. Thank you for your replies!
Let me repeat that, just in case you missed it: resuming pickled tasklets across architectures is easy. As far as I can see this gives Python an amazing capability to produce applications with highly-distributed architectures. It's going to be interesting to see where Stackless goes with this, but it should be shouted from the rooftops. This is an advanced feature that represents a real advantage for (Stackless) Python in a world where everyone is wondering how they can accommodate the new multicore processors.

With Stackless, your CPUs don't even need to have the same architecture, but the feature will work just as well in the homogeneous environment that multi-processor computers provide.

11 comments:

Anonymous said...

It is wonderful that Python has so many implementations.

With real world code, the Python community will:

1) get to the heart of the problem of taking full advantage of concurrent processing resources (homogeneous and heterogeneous)

2) get to the heart of the problem of taking full advantage of OS/hardware thread scheduling

3) get to the heart of the problem of taking full advantage of single process controlled mini-thread scheduling

4) figure out the best practices for all of the above

5) figure out what allows programmers to be productive NOW even with constrained resources (the most constrained resource being programmer man-hours)

I believe both Stackless and IronPython dotNET already allow you do to most or all of the concurrent programming models mentioned above.

My guess is that we will move to a model of cheap mini-threads passing messages between themselves, whether they are on the same machine or across networks. They will share absolutely no state, so if you have some state to manage, no matter how small, you will not hesitate to assign a cheap mini-thread to manage it.

Behind the scenes, the machine looks for opportunities to efficiently implement the message passing between a set of mini-threads, “unrolling” the message passing, implementing those mini-threads as OS/hardware scheduled threads or whatever, with some shared state, inside a single process. The programmer will never have to worry about the pitfalls of shared state, because the machine will use provable correct transformations turning the message-passing with no shared state into efficiently scheduled threads with shared state.

Anonymous said...

Would it be possible to have Stackless Python integrated with the standard Python distributions so that virtually every Python has it available. One could then distribute Stackless Python apps with the instructions to run them using a command like: spython app.py

Failing this, perhaps the major bundlers of Python like Activestate and Visual Python and Enthought could be encouraged to include Stackless Python in a standard way.

Steve said...

"Would it be possible to have Stackless Python integrated with the standard Python distributions ... ?"

I'd like to think that at some stage the developers could be persuaded to take it on board, but I'm not sure I can be sufficiently persuasive on my own. It would certainly be nice to see it happen, and Chris Tismer has made huge strides in implementation techniques since the last time developers seriously considered this quesion.

Anonymous said...

different strokes for different folks. as is, most python people have no motive for using stackless. even though stackless is the one true path, no reason for forcing it onto others.

Steve said...

That's as may be, but if Stackless can be installed as a run-time option in a minimally-intrusive way I'd rather see it distributed as a part of standard Python than having to be added on.

It used to be quite a tricky patch, but as far as I understand things it doesn't nowadays require huge changes to the interpreter code. I'd need to hear from Chris Tismer to be sure of that, though.

Anonymous said...

Stackless was a godsend when I found myself writing some fairly mundane data parsing code recently. Never mind exotic applications that migrate tasklets among hereogeneous platforms; as far as I am concerned, it is an indispensable tool for taming unruly spaghetti code in all shapes and sizes.

Here was the problem: parse a sequence of data "items" from an input text, where a data item may contains references to other data items within the same text. A two-pass parser was just too much hassle, because the data format is formidably complex and layered. Being a fan of evolutionary development (since Python code is so easily refactorable), I wrote a one-pass parser to start with, which of course could only handle backward references (ie, to data already parsed).

The item parsing code goes several method calls deep. When a reference is parsed, the find() method is called to resolve the reference. The first implementation of find() simply raised a LookupError if it failed. What I wanted to do was use coroutines, since then find() could simply switch to another coroutine instead of raising LookupError, and get resumed later on... ideally once the item it wanted had been parsed. Since calls to find() were done several method calls deep, however, I couldn't easily use 2.5's generator send() to implement coroutines.

With stackless it was a breeze. Each item gets parsed in its own tasklet. The find() method, if unsuccessful, simply suspends the tasklet, and re-tries when awoken. Once all items have been done, the suspended tasklets are repeatedly woken until either all have terminated or they all re-suspend, which indicates an illegal reference loop.

What I expected to take a day or two took me a couple of hours, and the parsing code remains simple, since I didn't have to involute it into a two-pass monstrosity. How neat is that?

Steve said...

Neat enough to make me wish it wasn't a separate download ...

Anonymous said...

Anonymous, Is your code or sniplets of it online anywhere? I'd love to see how this looks.

Unknown said...

Stackless Python Tutorials and Resources
http://islab.org/stackless/

Telavian said...

Writing a two pass parser is not that difficult. I bet your implementation is more difficult to follow than a simple two pass parser.

Steve said...

@Telavian: Did you by any chance add this comment to the wrong post? I don't understand how it's supposed to be relevant. Perhaps you could enlighten me?