This is the HTML rendering of the Jupyter notebook that can be found in this repository.
Understanding Descriptors¶
This notebook introduces the concepts behind descriptors.
The descriptor underlies much of the Python language's sophistication but the mechanism behind it remains little known.
If you install the RISE extension it also displays as slides.
Unqualified name access¶
The interpreter looks in:
- The current function's local namespace.
- The namespace of the module containing the function (the global namespace).
- The built-in namespace.
If the name is not found, the interpreter raises a NameError
exception.
When functions and/or classes are lexically nested within each other this creates certain complexities with locality, which need not detain us here.
We begin at the point where we've looked up a name
and have a reference to some object o
.
Attribute access¶
To evaluate an expression such as o.x
the interpreter first resolves the unqualified name o
by the means described above.
To a first approximation, it then looks for the name x
- in the namespace associated with
o
(usually referenceable aso.__dict__
) - in the namespace of the object's class
- in the class's superclass
- in the superclass's superclass ...
... and so on all the way up to object
.
If the search fails, the interpreter raises
an AttributeError
exception.
As we'll see, this isn't the whole story!
The built-in dir
function conveniently bundles
together all (or most of) the names accessible in
an object's namespace.
Because many of the various dunder
names (those of the form __name__
)
are inherited from object
I've written
a convenience function to omit them.
def names(obj):
"Return a list of all accessible names except dunders."
return [n for n in dir(obj)
if not (n.startswith('__')
and n.endswith('__'))]
Here's a simple class that has one class variable and one instance variable.
class DemoObject:
c: int = 42
def __init__(self, v):
self.v = v
o = DemoObject(21)
Because of the name search protocol described above you can reference class attributes as though they were instance attributes.
o.v, o.c
(21, 42)
Some attributes appear in the instance's __dict__
,
others appear in the class's.
It all depends how the assignment is made.
DemoObject.__dict__ # Shows what's defined in the class
mappingproxy({'__module__': '__main__', '__annotations__': {'c': int}, 'c': 42, '__init__': <function __main__.DemoObject.__init__(self, v)>, '__dict__': <attribute '__dict__' of 'DemoObject' objects>, '__weakref__': <attribute '__weakref__' of 'DemoObject' objects>, '__doc__': None})
names(DemoObject)
['c']
o.__dict__
{'v': 21}
Remember, though, that although attribute access follows the class hierarchy, attribute assignment (name binding) doesn't. Name binding takes place in the namespace of the object whose attribute is being bound.
Once the name is bound in the local namespace that will shadow the binding in the class namespace.
o.c = 43 # Binds in the instance namespace
o.__dict__
{'v': 21, 'c': 43}
o.c
43
o.__class__.c # Class variable remains unchanged
42
The descriptor protocol¶
Many programmers are familiar with properties. They are just a special case of a more general mechanism called the descriptor protocol.
What's the descriptor protocol? Briefly, any type that
implements any of the __get__
, __set__
or __delete__
methods
conforms to the protocol.
Time for another convenience function: we'd like to know whether a particular attribute is a descriptor.
def is_descriptor(p):
names = dir(p) # Sees inherited names also
return any(n in names
for n in ("__get__", "__set__", "__delete__")
)
Rather than using the property
decorator,
we're going to build our own descriptor.
It won't have __set__
and __delete__
methods,
making this a read-only
(nowadays, a non-overriding) descriptor.
I'll explain the "non-overriding" term later.
class D1:
"""
Our first read-only, (non-overriding) descriptor
"""
def __get__(self, obj, objtype=None):
print(f"self: {self}\nobj : {obj}\ntype: {objtype}")
return "I'm a D1"
The descriptor magic isn't immediately obvious.
Creating a D1
and accessing its value
clearly doesn't call the __get__
method.
d1 = D1()
d1
<__main__.D1 at 0x1065aea90>
The magic appears when you create an instance of the property class as a class variable. Here's a class that does just that.
class C1:
d: D1 = D1()
c1 = C1()
c1.d
self: <__main__.D1 object at 0x1065afb50> obj : <__main__.C1 object at 0x1065c4590> type: <class '__main__.C1'>
"I'm a D1"
Let's examine the namespaces of C1 and its instance. Care is needed to avoid triggering unwanted descriptor behaviour!
names(C1)
['d']
(type(C1.__dict__['d']),
is_descriptor(C1.__dict__['d']),
type(C1.d),
is_descriptor(C1.d)
)
self: <__main__.D1 object at 0x1065afb50> obj : None type: <class '__main__.C1'> self: <__main__.D1 object at 0x1065afb50> obj : None type: <class '__main__.C1'>
(__main__.D1, True, str, False)
A slightly more adventurous descriptor lets us initialise its value.
class D2:
def __init__(self, val):
self._v = val
def __get__(self, obj, objtype=None):
print(f"getting _v from {obj} in {self.__class__.__name__}: {self._v!r}")
return self._v
def desc_methods(obj):
"Show which descriptor methods are implemented."
for name in ("__get__", "__set__", "__delete__"):
print(f"{name:10}: {hasattr(obj, name)}")
desc_methods(D2)
__get__ : True __set__ : False __delete__: False
class C2:
d: D2 = D2(42)
c2 = C2()
c2.d
getting _v from <__main__.C2 object at 0x1065d07d0> in D2: 42
42
There's no __set__
method,
therefore assignment isn't overridden by the decorator,
and makes an entry in the instance's __dict__
.
Similarly, because there's no __delete__
it can't be destroyed.
try:
del c2.d
except AttributeError as e:
print(e)
'C2' object has no attribute 'd'
c2.d = 2345
c2.__dict__
{'d': 2345}
Because the descriptor is non-overriding,
now there's a __dict__
entry
it's used to return the attribute value
without calling the property's __get__
.
c2.d
2345
Extending a descriptor to assignment¶
The D3 descriptor will do everything the D2
can,
but adds a __set__
method making it an
overriding descriptor.
class D3(D2):
def __set__(self, instance, value):
self._v = value
desc_methods(D3)
__get__ : True __set__ : True __delete__: False
class C3:
d: D3 = D3("initial")
c3 = C3()
c3.d
getting _v from <__main__.C3 object at 0x1064c0910> in D3: 'initial'
'initial'
c3.d = "changed"
c3.d
getting _v from <__main__.C3 object at 0x1064c0910> in D3: 'changed'
'changed'
REMEMBER: The interpreter looks for an overriding descriptor in the class hierarchy before searching for regular instance/class variables.
If found, the descriptor is used when the name is accessed as an attribute of the instance even if an instance variable with the same name as the class's property exists.
This is why they are called overriding descriptors.
class DN:
def __get__(self, obj, objtype=None):
print(f"getting _dn from {self.__class__.__name__} instance {obj}")
return obj.__dict__["_dn"]
def __set__(self, obj, value):
print(f"setting _dn in {obj} to {value!r}")
obj.__dict__["_dn"] = value
class CN:
d: DN = DN() # CN.d is an instance of descriptor class DN
cn = CN()
cn.d = 12345
setting _dn in <__main__.CN object at 0x1065dedd0> to 12345
cn.d
getting _dn from DN instance <__main__.CN object at 0x1065dedd0>
12345
cn.__dict__
{'_dn': 12345}
Q: Why can't a class have more than one DN descriptor?¶
A: Because all DN descriptors save their state in the same _dn
instance variable¶
You'll find an answer to this conundrum later in this notebook.
Properties¶
Properties are descriptors that many Python programmers are at least aware of. They don't behave quite like raw descriptors, because they are always overriding.
Now we understand the underlying mechanism, let's refresh our memory about properties. At the same time it should reinforce the material on descriptors.
help(property)
Help on class property in module builtins: class property(object) | property(fget=None, fset=None, fdel=None, doc=None) | | Property attribute. | | fget | function to be used for getting an attribute value | fset | function to be used for setting an attribute value | fdel | function to be used for del'ing an attribute | doc | docstring | | Typical use is to define a managed attribute x: | | class C(object): | def getx(self): return self._x | def setx(self, value): self._x = value | def delx(self): del self._x | x = property(getx, setx, delx, "I'm the 'x' property.") | | Decorators make defining new properties or modifying existing ones easy: | | class C(object): | @property | def x(self): | "I am the 'x' property." | return self._x | @x.setter | def x(self, value): | self._x = value | @x.deleter | def x(self): | del self._x | | Methods defined here: | | __delete__(self, instance, /) | Delete an attribute of instance. | | __get__(self, instance, owner=None, /) | Return an attribute of instance, which is of type owner. | | __getattribute__(self, name, /) | Return getattr(self, name). | | __init__(self, /, *args, **kwargs) | Initialize self. See help(type(self)) for accurate signature. | | __set__(self, instance, value, /) | Set an attribute of instance to value. | | __set_name__(...) | Method to set name of a property. | | deleter(...) | Descriptor to obtain a copy of the property with a different deleter. | | getter(...) | Descriptor to obtain a copy of the property with a different getter. | | setter(...) | Descriptor to obtain a copy of the property with a different setter. | | ---------------------------------------------------------------------- | Static methods defined here: | | __new__(*args, **kwargs) from builtins.type | Create and return a new object. See help(type) for accurate signature. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __isabstractmethod__ | | fdel | | fget | | fset
names(property)
['deleter', 'fdel', 'fget', 'fset', 'getter', 'setter']
Properties define three decorators
and three functions
that are called from the
__del__
, __get__
and __set__
methods
respectively when present.
A simple use of a property is to define a virtual or computed attribute. A simplistic example follows.
class Person:
def __init__(self, first, last):
self.first = first
self.last = last
@property
def full_name(self):
return f"{self.first} {self.last}"
me = Person("Steve", "Holden")
me.full_name
'Steve Holden'
Any reference to the property as an instance attribute
causes the property's __get__
method to be called.
The return value is the value of the attribute lookup.
type(me.full_name), me.full_name
(str, 'Steve Holden')
References to the property as a class attribute, however,
do not cause a call to the property's __get__
,
so the class attribute's value is the property instance.
type(Person.full_name), Person.full_name
(property, <property at 0x1065cb790>)
Person.full_name
doesn't offer any way
to change the composite value of the person's
full name - its definition doesn't provide
a setter, or a deleter.
for name in names(Person.full_name):
print(f"{name:10}: {type(getattr(Person.full_name, name))}")
deleter : <class 'builtin_function_or_method'> fdel : <class 'NoneType'> fget : <class 'function'> fset : <class 'NoneType'> getter : <class 'builtin_function_or_method'> setter : <class 'builtin_function_or_method'>
list(me.__dict__)
['first', 'last']
names(me)
['first', 'full_name', 'last']
try:
me.full_name = "Simon Willison"
except Exception as e:
print("Exception:", e)
Exception: property 'full_name' of 'Person' object has no setter
try:
del me.full_name
except Exception as e:
print("Exception:", e)
Exception: property 'full_name' of 'Person' object has no deleter
If a property is registered in a class, then
it will take precedence over an entry in the
instance's __dict__
because properties are always overriding.
me.__dict__['full_name'] = "Sherlock Holmes"
me.full_name
'Steve Holden'
Properties as instance attributes don't invoke the same behaviour.
@property
def some_prop(self):
return "My very own property"
me.my_prop = some_prop
me.my_prop
<property at 0x1065f4450>
type(some_prop)
property
We find that a property is indeed a descriptor.
names(property)
['deleter', 'fdel', 'fget', 'fset', 'getter', 'setter']
desc_methods(Person.full_name)
__get__ : True __set__ : True __delete__: True
is_descriptor(Person.full_name), Person.full_name
(True, <property at 0x1065cb790>)
When the descriptor is looked up as an instance attribute,
however,
the value returned is generated by calling the descriptor's
__get__
method.
is_descriptor(me.full_name), me.full_name
(False, 'Steve Holden')
Bonus material¶
Use this as the basis for further investigations. Here are descriptors I wrote for the ticketing project.
The main goal was canonicalisation: sometimes strings were being assigned and an integer was required, for example.
Several things are of potential interest.
- When default values are provided they cannot be positional
(That's the purpose of the
*
in the method signature). - The set_name method is used to determine the name bound to the descriptor in its client class.
- This allows different instances of the same descriptor to coexist within a class definition, so you need not define each one as a distinct class.
class StringInt:
def __init__(self, *, default=0):
self._default = default
def __set_name__(self, owner, name):
self._name = "_" + name
def __get__(self, obj, type):
if obj is None:
return self._default
return getattr(obj, self._name, self._default)
def __set__(self, obj, value):
setattr(obj, self._name, int(value))
class StringBool:
def __init__(self, *, default='FALSE'):
self._default = default
def __set_name__(self, owner, name):
self._name = "_" + name
def __get__(self, obj, type):
if obj is None:
return self._default
return getattr(obj, self._name, self._default)
def __set__(self, obj, value):
setattr(obj, self._name, value in ["TRUE", "True", "1", 1, True])
Further reading¶
- Descriptor How-To Guide by Raymond Hettinger. Uses older terminology (data and non-data rather than overriding and non-overriding).
Nick Fitzsimmons asked ...¶
A descriptor whose definition deleted itself (!).
class StringInt:
def __init__(self, *, default=0):
self._default = default
def __set_name__(self, owner, name):
del owner.__dict__[name] # <<<<<<<<<<<<<<<<<<<
def __get__(self, obj, type):
if obj is None:
return self._default
return getattr(obj, self._name, self._default)
def __set__(self, obj, value):
setattr(obj, self._name, int(value))
try:
class User:
si: StringInt = StringInt()
except RuntimeError as e:
print("RuntimeError:", e)
RuntimeError: Error calling __set_name__ on 'StringInt' instance 'si' in 'User'
The actual issue is that __set_name__
tries to modify
the class __dict__
, which you may remember is a
(read-only) mappingproxy
object.
Conclusion: not without getting really tricky!
Functions are descriptors¶
type(desc_methods)
function
is_descriptor(desc_methods)
True
desc_methods(desc_methods)
__get__ : True __set__ : False __delete__: False
Question¶
Why are functions descriptors? What advantages does this confer?
names(desc_methods)
[]
desc_methods.__get__(desc_methods)()
__get__ : True __set__ : False __delete__: False
desc_methods.__class__.__dict__
mappingproxy({'__new__': <function function.__new__(*args, **kwargs)>, '__repr__': <slot wrapper '__repr__' of 'function' objects>, '__call__': <slot wrapper '__call__' of 'function' objects>, '__get__': <slot wrapper '__get__' of 'function' objects>, '__closure__': <member '__closure__' of 'function' objects>, '__doc__': <member '__doc__' of 'function' objects>, '__globals__': <member '__globals__' of 'function' objects>, '__module__': <member '__module__' of 'function' objects>, '__builtins__': <member '__builtins__' of 'function' objects>, '__code__': <attribute '__code__' of 'function' objects>, '__defaults__': <attribute '__defaults__' of 'function' objects>, '__kwdefaults__': <attribute '__kwdefaults__' of 'function' objects>, '__annotations__': <attribute '__annotations__' of 'function' objects>, '__dict__': <attribute '__dict__' of 'function' objects>, '__name__': <attribute '__name__' of 'function' objects>, '__qualname__': <attribute '__qualname__' of 'function' objects>})
Here endeth the notebook¶
I hope this little tour through descriptors has not only explained an important Python mechanism, but also encouraged you to be adventurous in using notebooks or the interactive interpreter to explore Python's lesser-known corners.
No comments:
Post a Comment