January 15, 2025

I want to bang some heads together!

It's frustrating when useful tools refuse to work together nicely. In the past I've experienced conflicts between black and flake8 that made it impossible to commit via my default commit hooks. Now I'm seeing the same behaviour with black and reorder-python-imports.

In short, almost a year ago now github user maxwell-k reported that black release 24.1.0 had introduced an incompatibility with reorder-python-imports by starting to require a blank line after a module docstring. In the discussion on the bug report the black crew make the reasonable-seeming point that it's black's job to determine the disposition of whitespace, and that reorder-python-imports should do what its name implies and nothing more. This would respect the long-standing Unix tradition that each tool should as far as possible perform a single function.

Unfortunately, when elagil raised the same issue with the reorder-python-imports developers, with a request to make their project usable with black (ably supported by maxwell-k), they received a response which I can only (avoiding the use of expletives) describe as disappointing:

anything is possible. will it happen here: no

In my opinion this uncompromising attitude displays the worst kind of arrogance from a developer, and I frankly fail to see who benefits from this refusal to bend (except perhaps a developer unwilling to work further on a project or set it free). The net consequence from my own point of view is that I'll no longer be using reorder-python-imports, nor recommending it.

The situation remains unchanged. Life's too short to persuade donkeys to move. On the plus side, research into solving this irritation led me to start working with ruff, which provides the functionality of both utilities in a single rather faster tool. It's an ill wind that blows nobody any good. Goodbye, donkeys!

May 24, 2024

LiveBlog of the Pycon IT 24 Lightning Talks

These things move fast. No guarantees! They're doing it "The Italian Way," which apparently involves forfeits.

First, conference announcements. No mic - not a great start. Oh, here we are!

PyCon Portugal is 17 - 19 October 2024. There was a great acronym for Portugal, which I didn't have time to write down.

EuroPython is in Prague and remote, in July. 8-9 Tutorials and workshops, 10-12, talks, 13-14 Sprints.

Alessia Marcolini and Valerio Maggio chaired. They told us why pineapple pizza is such an awful idea. They then introduced the wheel of fortune, and the outcome determined various forfeits that speakers were given to force them to behave in specific ways: Swimming in the Arno, Italian hand gestures, Quack it, and various others, including the pineapple pizza, which cut the speakers' time down to 3 minutes.

Luca, the first speaker, spun the wheel too hard, so it took us a minute to discover he had to talk while swimming. With props - goggles and a snake floatation aid.. His talk was about annotating decorators. He showed an untyped decorator to time functions. Very amusing to see him remove the mypy errors one by one. He then turned it into a timing context manager, which also acted as a decorator. Great talk.

Samuel Colvin talked about Jiter, a JSON parser, drew the quack, and the audience decided to quack on the word JSON. He demonstrated, to much applause, how jiter can parse the outout from ChatGPT continuously, rather than having to wait until the whole thing was received. With about thirty quacks, until he wised up and cut down on his use of the word!

Peter next talked about PyClean, a project he wrote himself, while swimming. (He presented it while swimming, he wrote it some other time). He's worked on it so long he's forgotten what it does, but it's available on PyPi. The package actually cleans up your Python bytecode files from your __pycache__ directories. You might think this was a one-line find command, but it's written in Python, and it works!

Rodrigo became another swimmer, and talked about Elegant Code. His example was a simple GUI-based calculator he found online, so he refactored it before our very eyes. Rodrigo prefers expressions to statements, so he replaced many assignments either by simply using parenthisised walrus expressions, or introducing an auxiliary function. The function then became a single lambda expression. The import statements also became walrus expressions using __import__, and he then introduced the concept of "long-circuititng." By the time he'd finished, he'd conjoined all the expressions into a single expression. Most amusing.

By this point it had become less than clear which forfeit was being executed. The next speaker, whose name I did not catch, does two things every day - stay on the computer and listen to music, so he asked why not combine the two. An audience member chose the song "Unbroken" for an example, but apparently that title was too simple. So we next tried "Never going to give you up." His code then created album wallpaper from the Spotify page, and he challenged everyone to use the Gnome desktop, which is currently its only platform. Good talk!

Miro then talked about various keyboard layous, but he drew the pizza so had to talk in three minutes. He told us we were all using weird keyboard layouts, and then told us all about the right ones in varioius languages I couldn't identify, so while he amused the native audience greatly the humour I got was by looking at all the glyphs overlaid on a single keyboard. He explained compose sequences. Wish I'd got a picture. He closed by suggesting the remapping of the SHIFT LOCK and CTRL keys to make ESCAPE more available. Excellent.

Rigel, the next speaker again became a swimmer, and we started to suspect the wheel was rigged. Problems with the projector paused him for a while, blaming his Windows computer for the issue, so we watched his slides from a web browser. Rigel is a bad skydiver, but he's an engineering manager who claimed to have no idea what his staff do. He's hiring, and talked about Booth's second rule: the safer skydiving becomes, the more risks skydivers take to keep the fatality rate constant. He demonstrated his "horrible code" to mine Bitcoin or similar crypto currencies, and then how raising the difficulty of the required hash of the transaction required nore steps, and finally demonstrated things ran faster in parallel. Neat.

Giuseppe Biradi is a project and program manager, who tried to persuade us that PMs Are (Not) Evil. Because he loves Python he has become a coder, and got the idea for this talk from his coding colleagues' horrified expressions. He claims that PMs are like the pinball firing pin, which can lead to lots of complex behaviours. Or perhaps like someone who gets grain, mill and various other components to finally produce bread. We shouild all respect each others' rules, and sing Kumbaya round the camp fire. I think.

Minita spoke on imposter syndrome. True story based on personal opinoins, and he isn't used to extemporising. Before the pandemic he didn't speak at conferences, but now he enjoys them a lot. There are five stages of imposter syndrome: denial, anger, bargaining, depression and finally acceptance. He showed us a picture of hime helping a fellow Pythonista, and uses a "Yes, you can" sticker on his wall to maintain confidence.

All in all an entertaining session, and one that brought much of the conference together before we all went off to dinner. Sometimes chaotic and always amusing, lightning talks are always one of the high spots of any PyCon. Thanks to all speakers, and the hosts.

February 29, 2024

Understanding Python Descriptors

Understanding Descriptors

This is the HTML rendering of the Jupyter notebook that can be found in this repository.

Understanding Descriptors¶

This notebook introduces the concepts behind descriptors.

The descriptor underlies much of the Python language's sophistication but the mechanism behind it remains little known.

If you install the RISE extension it also displays as slides.

Unqualified name access¶

The interpreter looks in:

The current function's local namespace.

The namespace of the module containing the function (the global namespace).

The built-in namespace.

If the name is not found, the interpreter raises a NameError exception.

When functions and/or classes are lexically nested within each other this creates certain complexities with locality, which need not detain us here.

We begin at the point where we've looked up a name and have a reference to some object o.

Attribute access¶

To evaluate an expression such as o.x the interpreter first resolves the unqualified name o by the means described above.

To a first approximation, it then looks for the name x

in the namespace associated with o (usually referenceable as o.__dict__)
in the namespace of the object's class
in the class's superclass
in the superclass's superclass ...

... and so on all the way up to object. If the search fails, the interpreter raises an AttributeError exception.

As we'll see, this isn't the whole story!

The built-in dir function conveniently bundles together all (or most of) the names accessible in an object's namespace.

Because many of the various dunder names (those of the form __name__) are inherited from object I've written a convenience function to omit them.

In [1]:

def names(obj):
    "Return a list of all accessible names except dunders."
    return [n for n in dir(obj)
            if not (n.startswith('__')
                    and n.endswith('__'))]

Here's a simple class that has one class variable and one instance variable.

In [2]:

class DemoObject:
    c: int = 42
    def __init__(self, v):
        self.v = v

o = DemoObject(21)

Because of the name search protocol described above you can reference class attributes as though they were instance attributes.

In [3]:

o.v, o.c

Out[3]:

(21, 42)

Some attributes appear in the instance's __dict__, others appear in the class's.

It all depends how the assignment is made.

In [4]:

DemoObject.__dict__  # Shows what's defined in the class

Out[4]:

mappingproxy({'__module__': '__main__',
              '__annotations__': {'c': int},
              'c': 42,
              '__init__': <function __main__.DemoObject.__init__(self, v)>,
              '__dict__': <attribute '__dict__' of 'DemoObject' objects>,
              '__weakref__': <attribute '__weakref__' of 'DemoObject' objects>,
              '__doc__': None})

In [5]:

names(DemoObject)

Out[5]:

['c']

In [6]:

o.__dict__

Out[6]:

{'v': 21}

Remember, though, that although attribute access follows the class hierarchy, attribute assignment (name binding) doesn't. Name binding takes place in the namespace of the object whose attribute is being bound.

Once the name is bound in the local namespace that will shadow the binding in the class namespace.

In [7]:

o.c = 43  # Binds in the instance namespace

In [8]:

o.__dict__

Out[8]:

{'v': 21, 'c': 43}

In [9]:

o.c

Out[9]:

In [10]:

o.__class__.c  # Class variable remains unchanged

Out[10]:

The descriptor protocol¶

Many programmers are familiar with properties. They are just a special case of a more general mechanism called the descriptor protocol.

What's the descriptor protocol? Briefly, any type that implements any of the __get__, __set__ or __delete__ methods conforms to the protocol.

Time for another convenience function: we'd like to know whether a particular attribute is a descriptor.

In [11]:

def is_descriptor(p):
    names = dir(p)  # Sees inherited names also
    return any(n in names
               for n in ("__get__", "__set__", "__delete__")
              )

Rather than using the property decorator, we're going to build our own descriptor.

It won't have __set__ and __delete__ methods, making this a read-only (nowadays, a non-overriding) descriptor. I'll explain the "non-overriding" term later.

In [12]:

class D1:
    """
    Our first read-only, (non-overriding) descriptor
    """
    def __get__(self, obj, objtype=None):
        print(f"self: {self}\nobj : {obj}\ntype: {objtype}")
        return "I'm a D1"

The descriptor magic isn't immediately obvious. Creating a D1 and accessing its value clearly doesn't call the __get__ method.

In [13]:

d1 = D1()

In [14]:

d1

Out[14]:

<__main__.D1 at 0x1065aea90>

The magic appears when you create an instance of the property class as a class variable. Here's a class that does just that.

In [15]:

class C1:
    
    d: D1 = D1()
    
c1 = C1()

In [16]:

c1.d

self: <__main__.D1 object at 0x1065afb50>
obj : <__main__.C1 object at 0x1065c4590>
type: <class '__main__.C1'>

Out[16]:

"I'm a D1"

Let's examine the namespaces of C1 and its instance. Care is needed to avoid triggering unwanted descriptor behaviour!

In [17]:

names(C1)

Out[17]:

['d']

In [18]:

(type(C1.__dict__['d']),
 is_descriptor(C1.__dict__['d']),
 type(C1.d),
 is_descriptor(C1.d)
)

self: <__main__.D1 object at 0x1065afb50>
obj : None
type: <class '__main__.C1'>
self: <__main__.D1 object at 0x1065afb50>
obj : None
type: <class '__main__.C1'>

Out[18]:

(__main__.D1, True, str, False)

A slightly more adventurous descriptor lets us initialise its value.

In [19]:

class D2:
    def __init__(self, val):
        self._v = val
    def __get__(self, obj, objtype=None):
        print(f"getting _v from {obj} in {self.__class__.__name__}: {self._v!r}")
        return self._v

In [20]:

def desc_methods(obj):
    "Show which descriptor methods are implemented."
    for name in ("__get__", "__set__", "__delete__"):
        print(f"{name:10}: {hasattr(obj, name)}")

In [21]:

desc_methods(D2)

__get__   : True
__set__   : False
__delete__: False

In [22]:

class C2:
    d: D2 = D2(42)

In [23]:

c2 = C2()

In [24]:

c2.d 

getting _v from <__main__.C2 object at 0x1065d07d0> in D2: 42

Out[24]:

There's no __set__ method, therefore assignment isn't overridden by the decorator, and makes an entry in the instance's __dict__.

Similarly, because there's no __delete__ it can't be destroyed.

In [25]:

try:
    del c2.d
except AttributeError as e:
    print(e)

'C2' object has no attribute 'd'

In [26]:

c2.d = 2345

In [27]:

c2.__dict__

Out[27]:

{'d': 2345}

Because the descriptor is non-overriding, now there's a __dict__ entry it's used to return the attribute value without calling the property's __get__.

In [28]:

c2.d 

Out[28]:

Extending a descriptor to assignment¶

The D3 descriptor will do everything the D2 can, but adds a __set__ method making it an overriding descriptor.

In [29]:

class D3(D2):
    def __set__(self, instance, value):
        self._v = value

In [30]:

desc_methods(D3)

__get__   : True
__set__   : True
__delete__: False

In [31]:

class C3:
    d: D3 = D3("initial")

c3 = C3()

In [32]:

c3.d

getting _v from <__main__.C3 object at 0x1064c0910> in D3: 'initial'

Out[32]:

'initial'

In [33]:

c3.d = "changed"

In [34]:

c3.d

getting _v from <__main__.C3 object at 0x1064c0910> in D3: 'changed'

Out[34]:

'changed'

REMEMBER: The interpreter looks for an overriding descriptor in the class hierarchy before searching for regular instance/class variables.

If found, the descriptor is used when the name is accessed as an attribute of the instance even if an instance variable with the same name as the class's property exists.

This is why they are called overriding descriptors.

In [35]:

class DN:
    def __get__(self, obj, objtype=None):
        print(f"getting _dn from {self.__class__.__name__} instance {obj}")
        return obj.__dict__["_dn"]
    def __set__(self, obj, value):
        print(f"setting _dn in {obj} to {value!r}")
        obj.__dict__["_dn"] = value    

In [36]:

class CN:
    d: DN = DN()  # CN.d is an instance of descriptor class DN

cn = CN()

In [37]:

cn.d = 12345

setting _dn in <__main__.CN object at 0x1065dedd0> to 12345

In [38]:

cn.d

getting _dn from DN instance <__main__.CN object at 0x1065dedd0>

Out[38]:

In [39]:

cn.__dict__

Out[39]:

{'_dn': 12345}

Q: Why can't a class have more than one DN descriptor?¶

A: Because all DN descriptors save their state in the same `_dn` instance variable¶

You'll find an answer to this conundrum later in this notebook.

Properties¶

Properties are descriptors that many Python programmers are at least aware of. They don't behave quite like raw descriptors, because they are always overriding.

Now we understand the underlying mechanism, let's refresh our memory about properties. At the same time it should reinforce the material on descriptors.

In [40]:

help(property)

Help on class property in module builtins:

class property(object)
 |  property(fget=None, fset=None, fdel=None, doc=None)
 |  
 |  Property attribute.
 |  
 |    fget
 |      function to be used for getting an attribute value
 |    fset
 |      function to be used for setting an attribute value
 |    fdel
 |      function to be used for del'ing an attribute
 |    doc
 |      docstring
 |  
 |  Typical use is to define a managed attribute x:
 |  
 |  class C(object):
 |      def getx(self): return self._x
 |      def setx(self, value): self._x = value
 |      def delx(self): del self._x
 |      x = property(getx, setx, delx, "I'm the 'x' property.")
 |  
 |  Decorators make defining new properties or modifying existing ones easy:
 |  
 |  class C(object):
 |      @property
 |      def x(self):
 |          "I am the 'x' property."
 |          return self._x
 |      @x.setter
 |      def x(self, value):
 |          self._x = value
 |      @x.deleter
 |      def x(self):
 |          del self._x
 |  
 |  Methods defined here:
 |  
 |  __delete__(self, instance, /)
 |      Delete an attribute of instance.
 |  
 |  __get__(self, instance, owner=None, /)
 |      Return an attribute of instance, which is of type owner.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __set__(self, instance, value, /)
 |      Set an attribute of instance to value.
 |  
 |  __set_name__(...)
 |      Method to set name of a property.
 |  
 |  deleter(...)
 |      Descriptor to obtain a copy of the property with a different deleter.
 |  
 |  getter(...)
 |      Descriptor to obtain a copy of the property with a different getter.
 |  
 |  setter(...)
 |      Descriptor to obtain a copy of the property with a different setter.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __isabstractmethod__
 |  
 |  fdel
 |  
 |  fget
 |  
 |  fset

In [41]:

names(property)

Out[41]:

['deleter', 'fdel', 'fget', 'fset', 'getter', 'setter']

Properties define three decorators and three functions that are called from the __del__, __get__ and __set__ methods respectively when present.

A simple use of a property is to define a virtual or computed attribute. A simplistic example follows.

In [42]:

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last
    @property
    def full_name(self):
        return f"{self.first} {self.last}"

me = Person("Steve", "Holden")
me.full_name

Out[42]:

'Steve Holden'

Any reference to the property as an instance attribute causes the property's __get__ method to be called. The return value is the value of the attribute lookup.

In [43]:

type(me.full_name), me.full_name

Out[43]:

(str, 'Steve Holden')

References to the property as a class attribute, however, do not cause a call to the property's __get__, so the class attribute's value is the property instance.

In [44]:

type(Person.full_name), Person.full_name

Out[44]:

(property, <property at 0x1065cb790>)

Person.full_name doesn't offer any way to change the composite value of the person's full name - its definition doesn't provide a setter, or a deleter.

In [45]:

for name in names(Person.full_name):
    print(f"{name:10}: {type(getattr(Person.full_name, name))}")

deleter   : <class 'builtin_function_or_method'>
fdel      : <class 'NoneType'>
fget      : <class 'function'>
fset      : <class 'NoneType'>
getter    : <class 'builtin_function_or_method'>
setter    : <class 'builtin_function_or_method'>

In [46]:

list(me.__dict__)

Out[46]:

['first', 'last']

In [47]:

names(me)

Out[47]:

['first', 'full_name', 'last']

In [48]:

try:
    me.full_name = "Simon Willison"
except Exception as e:
    print("Exception:", e)

Exception: property 'full_name' of 'Person' object has no setter

In [49]:

try:
    del me.full_name
except Exception as e:
    print("Exception:", e)

Exception: property 'full_name' of 'Person' object has no deleter

If a property is registered in a class, then it will take precedence over an entry in the instance's __dict__ because properties are always overriding.

In [50]:

me.__dict__['full_name'] = "Sherlock Holmes"

In [51]:

me.full_name

Out[51]:

'Steve Holden'

Properties as instance attributes don't invoke the same behaviour.

In [52]:

@property
def some_prop(self):
    return "My very own property"

me.my_prop = some_prop

In [53]:

me.my_prop

Out[53]:

<property at 0x1065f4450>

In [54]:

type(some_prop)

Out[54]:

property

We find that a property is indeed a descriptor.

In [55]:

names(property)

Out[55]:

['deleter', 'fdel', 'fget', 'fset', 'getter', 'setter']

In [56]:

desc_methods(Person.full_name)

__get__   : True
__set__   : True
__delete__: True

In [57]:

is_descriptor(Person.full_name), Person.full_name

Out[57]:

(True, <property at 0x1065cb790>)

When the descriptor is looked up as an instance attribute, however, the value returned is generated by calling the descriptor's __get__ method.

In [58]:

is_descriptor(me.full_name), me.full_name

Out[58]:

(False, 'Steve Holden')

Question: Why can't a class have more than one DN descriptor?¶

Answer: because all descriptor instances would try to use the same instance variable of the client class's instances.¶

Bonus material¶

Use this as the basis for further investigations. Here are descriptors I wrote for the ticketing project.

The main goal was canonicalisation: sometimes strings were being assigned and an integer was required, for example.

Several things are of potential interest.

When default values are provided they cannot be positional (That's the purpose of the * in the method signature).
The set_name method is used to determine the name bound to the descriptor in its client class.
This allows different instances of the same descriptor to coexist within a class definition, so you need not define each one as a distinct class.

In [59]:

class StringInt:
    def __init__(self, *, default=0):
        self._default = default
    def __set_name__(self, owner, name):
        self._name = "_" + name
    def __get__(self, obj, type):
        if obj is None:
            return self._default
        return getattr(obj, self._name, self._default)
    def __set__(self, obj, value):
        setattr(obj, self._name, int(value))

In [60]:

class StringBool:
    def __init__(self, *, default='FALSE'):
        self._default = default
    def __set_name__(self, owner, name):
        self._name = "_" + name
    def __get__(self, obj, type):
        if obj is None:
            return self._default
        return getattr(obj, self._name, self._default)
    def __set__(self, obj, value):
        setattr(obj, self._name, value in ["TRUE", "True", "1", 1, True])

Nick Fitzsimmons asked ...¶

A descriptor whose definition deleted itself (!).

In [61]:

class StringInt:
    def __init__(self, *, default=0):
        self._default = default
    def __set_name__(self, owner, name):
        del owner.__dict__[name]  # <<<<<<<<<<<<<<<<<<<
    def __get__(self, obj, type):
        if obj is None:
            return self._default
        return getattr(obj, self._name, self._default)
    def __set__(self, obj, value):
        setattr(obj, self._name, int(value))

In [62]:

try:
    class User:
        si: StringInt = StringInt()
except RuntimeError as e:
    print("RuntimeError:", e)

RuntimeError: Error calling __set_name__ on 'StringInt' instance 'si' in 'User'

The actual issue is that __set_name__ tries to modify the class __dict__, which you may remember is a (read-only) mappingproxy object.

Conclusion: not without getting really tricky!

Functions are descriptors¶

In [63]:

type(desc_methods)

Out[63]:

function

In [64]:

is_descriptor(desc_methods)

Out[64]:

True

In [65]:

desc_methods(desc_methods)

__get__   : True
__set__   : False
__delete__: False

Question¶

Why are functions descriptors? What advantages does this confer?

In [66]:

names(desc_methods)

Out[66]:

[]

In [67]:

desc_methods.__get__(desc_methods)()

__get__   : True
__set__   : False
__delete__: False

In [68]:

desc_methods.__class__.__dict__

Out[68]:

mappingproxy({'__new__': <function function.__new__(*args, **kwargs)>,
              '__repr__': <slot wrapper '__repr__' of 'function' objects>,
              '__call__': <slot wrapper '__call__' of 'function' objects>,
              '__get__': <slot wrapper '__get__' of 'function' objects>,
              '__closure__': <member '__closure__' of 'function' objects>,
              '__doc__': <member '__doc__' of 'function' objects>,
              '__globals__': <member '__globals__' of 'function' objects>,
              '__module__': <member '__module__' of 'function' objects>,
              '__builtins__': <member '__builtins__' of 'function' objects>,
              '__code__': <attribute '__code__' of 'function' objects>,
              '__defaults__': <attribute '__defaults__' of 'function' objects>,
              '__kwdefaults__': <attribute '__kwdefaults__' of 'function' objects>,
              '__annotations__': <attribute '__annotations__' of 'function' objects>,
              '__dict__': <attribute '__dict__' of 'function' objects>,
              '__name__': <attribute '__name__' of 'function' objects>,
              '__qualname__': <attribute '__qualname__' of 'function' objects>})

Here endeth the notebook¶

I hope this little tour through descriptors has not only explained an important Python mechanism, but also encouraged you to be adventurous in using notebooks or the interactive interpreter to explore Python's lesser-known corners.