Understanding Descriptors

This is the HTML rendering of the Jupyter notebook that can be found in this repository.

Understanding Descriptors¶

This notebook introduces the concepts behind descriptors.

The descriptor underlies much of the Python language's sophistication but the mechanism behind it remains little known.

If you install the RISE extension it also displays as slides.

Unqualified name access¶

The interpreter looks in:

The current function's local namespace.

The namespace of the module containing the function (the global namespace).

The built-in namespace.

If the name is not found, the interpreter raises a NameError exception.

When functions and/or classes are lexically nested within each other this creates certain complexities with locality, which need not detain us here.

We begin at the point where we've looked up a name and have a reference to some object o.

Attribute access¶

To evaluate an expression such as o.x the interpreter first resolves the unqualified name o by the means described above.

To a first approximation, it then looks for the name x

in the namespace associated with o (usually referenceable as o.__dict__)
in the namespace of the object's class
in the class's superclass
in the superclass's superclass ...

... and so on all the way up to object. If the search fails, the interpreter raises an AttributeError exception.

As we'll see, this isn't the whole story!

The built-in dir function conveniently bundles together all (or most of) the names accessible in an object's namespace.

Because many of the various dunder names (those of the form __name__) are inherited from object I've written a convenience function to omit them.

In [1]:

def names(obj):
    "Return a list of all accessible names except dunders."
    return [n for n in dir(obj)
            if not (n.startswith('__')
                    and n.endswith('__'))]

Here's a simple class that has one class variable and one instance variable.

In [2]:

class DemoObject:
    c: int = 42
    def __init__(self, v):
        self.v = v

o = DemoObject(21)

Because of the name search protocol described above you can reference class attributes as though they were instance attributes.

In [3]:

o.v, o.c

Out[3]:

(21, 42)

Some attributes appear in the instance's __dict__, others appear in the class's.

It all depends how the assignment is made.

In [4]:

DemoObject.__dict__  # Shows what's defined in the class

Out[4]:

mappingproxy({'__module__': '__main__',
              '__annotations__': {'c': int},
              'c': 42,
              '__init__': <function __main__.DemoObject.__init__(self, v)>,
              '__dict__': <attribute '__dict__' of 'DemoObject' objects>,
              '__weakref__': <attribute '__weakref__' of 'DemoObject' objects>,
              '__doc__': None})

In [5]:

names(DemoObject)

Out[5]:

['c']

In [6]:

o.__dict__

Out[6]:

{'v': 21}

Remember, though, that although attribute access follows the class hierarchy, attribute assignment (name binding) doesn't. Name binding takes place in the namespace of the object whose attribute is being bound.

Once the name is bound in the local namespace that will shadow the binding in the class namespace.

In [7]:

o.c = 43  # Binds in the instance namespace

In [8]:

o.__dict__

Out[8]:

{'v': 21, 'c': 43}

In [9]:

o.c

Out[9]:

In [10]:

o.__class__.c  # Class variable remains unchanged

Out[10]:

The descriptor protocol¶

Many programmers are familiar with properties. They are just a special case of a more general mechanism called the descriptor protocol.

What's the descriptor protocol? Briefly, any type that implements any of the __get__, __set__ or __delete__ methods conforms to the protocol.

Time for another convenience function: we'd like to know whether a particular attribute is a descriptor.

In [11]:

def is_descriptor(p):
    names = dir(p)  # Sees inherited names also
    return any(n in names
               for n in ("__get__", "__set__", "__delete__")
              )

Rather than using the property decorator, we're going to build our own descriptor.

It won't have __set__ and __delete__ methods, making this a read-only (nowadays, a non-overriding) descriptor. I'll explain the "non-overriding" term later.

In [12]:

class D1:
    """
    Our first read-only, (non-overriding) descriptor
    """
    def __get__(self, obj, objtype=None):
        print(f"self: {self}\nobj : {obj}\ntype: {objtype}")
        return "I'm a D1"

The descriptor magic isn't immediately obvious. Creating a D1 and accessing its value clearly doesn't call the __get__ method.

In [13]:

d1 = D1()

In [14]:

d1

Out[14]:

<__main__.D1 at 0x1065aea90>

The magic appears when you create an instance of the property class as a class variable. Here's a class that does just that.

In [15]:

class C1:
    
    d: D1 = D1()
    
c1 = C1()

In [16]:

c1.d

self: <__main__.D1 object at 0x1065afb50>
obj : <__main__.C1 object at 0x1065c4590>
type: <class '__main__.C1'>

Out[16]:

"I'm a D1"

Let's examine the namespaces of C1 and its instance. Care is needed to avoid triggering unwanted descriptor behaviour!

In [17]:

names(C1)

Out[17]:

['d']

In [18]:

(type(C1.__dict__['d']),
 is_descriptor(C1.__dict__['d']),
 type(C1.d),
 is_descriptor(C1.d)
)

self: <__main__.D1 object at 0x1065afb50>
obj : None
type: <class '__main__.C1'>
self: <__main__.D1 object at 0x1065afb50>
obj : None
type: <class '__main__.C1'>

Out[18]:

(__main__.D1, True, str, False)

A slightly more adventurous descriptor lets us initialise its value.

In [19]:

class D2:
    def __init__(self, val):
        self._v = val
    def __get__(self, obj, objtype=None):
        print(f"getting _v from {obj} in {self.__class__.__name__}: {self._v!r}")
        return self._v

In [20]:

def desc_methods(obj):
    "Show which descriptor methods are implemented."
    for name in ("__get__", "__set__", "__delete__"):
        print(f"{name:10}: {hasattr(obj, name)}")

In [21]:

desc_methods(D2)

__get__   : True
__set__   : False
__delete__: False

In [22]:

class C2:
    d: D2 = D2(42)

In [23]:

c2 = C2()

In [24]:

c2.d 

getting _v from <__main__.C2 object at 0x1065d07d0> in D2: 42

Out[24]:

There's no __set__ method, therefore assignment isn't overridden by the decorator, and makes an entry in the instance's __dict__.

Similarly, because there's no __delete__ it can't be destroyed.

In [25]:

try:
    del c2.d
except AttributeError as e:
    print(e)

'C2' object has no attribute 'd'

In [26]:

c2.d = 2345

In [27]:

c2.__dict__

Out[27]:

{'d': 2345}

Because the descriptor is non-overriding, now there's a __dict__ entry it's used to return the attribute value without calling the property's __get__.

In [28]:

c2.d 

Out[28]:

Extending a descriptor to assignment¶

The D3 descriptor will do everything the D2 can, but adds a __set__ method making it an overriding descriptor.

In [29]:

class D3(D2):
    def __set__(self, instance, value):
        self._v = value

In [30]:

desc_methods(D3)

__get__   : True
__set__   : True
__delete__: False

In [31]:

class C3:
    d: D3 = D3("initial")

c3 = C3()

In [32]:

c3.d

getting _v from <__main__.C3 object at 0x1064c0910> in D3: 'initial'

Out[32]:

'initial'

In [33]:

c3.d = "changed"

In [34]:

c3.d

getting _v from <__main__.C3 object at 0x1064c0910> in D3: 'changed'

Out[34]:

'changed'

REMEMBER: The interpreter looks for an overriding descriptor in the class hierarchy before searching for regular instance/class variables.

If found, the descriptor is used when the name is accessed as an attribute of the instance even if an instance variable with the same name as the class's property exists.

This is why they are called overriding descriptors.

In [35]:

class DN:
    def __get__(self, obj, objtype=None):
        print(f"getting _dn from {self.__class__.__name__} instance {obj}")
        return obj.__dict__["_dn"]
    def __set__(self, obj, value):
        print(f"setting _dn in {obj} to {value!r}")
        obj.__dict__["_dn"] = value    

In [36]:

class CN:
    d: DN = DN()  # CN.d is an instance of descriptor class DN

cn = CN()

In [37]:

cn.d = 12345

setting _dn in <__main__.CN object at 0x1065dedd0> to 12345

In [38]:

cn.d

getting _dn from DN instance <__main__.CN object at 0x1065dedd0>

Out[38]:

In [39]:

cn.__dict__

Out[39]:

{'_dn': 12345}

Q: Why can't a class have more than one DN descriptor?¶

A: Because all DN descriptors save their state in the same `_dn` instance variable¶

You'll find an answer to this conundrum later in this notebook.

Properties¶

Properties are descriptors that many Python programmers are at least aware of. They don't behave quite like raw descriptors, because they are always overriding.

Now we understand the underlying mechanism, let's refresh our memory about properties. At the same time it should reinforce the material on descriptors.

In [40]:

help(property)

Help on class property in module builtins:

class property(object)
 |  property(fget=None, fset=None, fdel=None, doc=None)
 |  
 |  Property attribute.
 |  
 |    fget
 |      function to be used for getting an attribute value
 |    fset
 |      function to be used for setting an attribute value
 |    fdel
 |      function to be used for del'ing an attribute
 |    doc
 |      docstring
 |  
 |  Typical use is to define a managed attribute x:
 |  
 |  class C(object):
 |      def getx(self): return self._x
 |      def setx(self, value): self._x = value
 |      def delx(self): del self._x
 |      x = property(getx, setx, delx, "I'm the 'x' property.")
 |  
 |  Decorators make defining new properties or modifying existing ones easy:
 |  
 |  class C(object):
 |      @property
 |      def x(self):
 |          "I am the 'x' property."
 |          return self._x
 |      @x.setter
 |      def x(self, value):
 |          self._x = value
 |      @x.deleter
 |      def x(self):
 |          del self._x
 |  
 |  Methods defined here:
 |  
 |  __delete__(self, instance, /)
 |      Delete an attribute of instance.
 |  
 |  __get__(self, instance, owner=None, /)
 |      Return an attribute of instance, which is of type owner.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __set__(self, instance, value, /)
 |      Set an attribute of instance to value.
 |  
 |  __set_name__(...)
 |      Method to set name of a property.
 |  
 |  deleter(...)
 |      Descriptor to obtain a copy of the property with a different deleter.
 |  
 |  getter(...)
 |      Descriptor to obtain a copy of the property with a different getter.
 |  
 |  setter(...)
 |      Descriptor to obtain a copy of the property with a different setter.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __isabstractmethod__
 |  
 |  fdel
 |  
 |  fget
 |  
 |  fset

In [41]:

names(property)

Out[41]:

['deleter', 'fdel', 'fget', 'fset', 'getter', 'setter']

Properties define three decorators and three functions that are called from the __del__, __get__ and __set__ methods respectively when present.

A simple use of a property is to define a virtual or computed attribute. A simplistic example follows.

In [42]:

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last
    @property
    def full_name(self):
        return f"{self.first} {self.last}"

me = Person("Steve", "Holden")
me.full_name

Out[42]:

'Steve Holden'

Any reference to the property as an instance attribute causes the property's __get__ method to be called. The return value is the value of the attribute lookup.

In [43]:

type(me.full_name), me.full_name

Out[43]:

(str, 'Steve Holden')

References to the property as a class attribute, however, do not cause a call to the property's __get__, so the class attribute's value is the property instance.

In [44]:

type(Person.full_name), Person.full_name

Out[44]:

(property, <property at 0x1065cb790>)

Person.full_name doesn't offer any way to change the composite value of the person's full name - its definition doesn't provide a setter, or a deleter.

In [45]:

for name in names(Person.full_name):
    print(f"{name:10}: {type(getattr(Person.full_name, name))}")

deleter   : <class 'builtin_function_or_method'>
fdel      : <class 'NoneType'>
fget      : <class 'function'>
fset      : <class 'NoneType'>
getter    : <class 'builtin_function_or_method'>
setter    : <class 'builtin_function_or_method'>

In [46]:

list(me.__dict__)

Out[46]:

['first', 'last']

In [47]:

names(me)

Out[47]:

['first', 'full_name', 'last']

In [48]:

try:
    me.full_name = "Simon Willison"
except Exception as e:
    print("Exception:", e)

Exception: property 'full_name' of 'Person' object has no setter

In [49]:

try:
    del me.full_name
except Exception as e:
    print("Exception:", e)

Exception: property 'full_name' of 'Person' object has no deleter

If a property is registered in a class, then it will take precedence over an entry in the instance's __dict__ because properties are always overriding.

In [50]:

me.__dict__['full_name'] = "Sherlock Holmes"

In [51]:

me.full_name

Out[51]:

'Steve Holden'

Properties as instance attributes don't invoke the same behaviour.

In [52]:

@property
def some_prop(self):
    return "My very own property"

me.my_prop = some_prop

In [53]:

me.my_prop

Out[53]:

<property at 0x1065f4450>

In [54]:

type(some_prop)

Out[54]:

property

We find that a property is indeed a descriptor.

In [55]:

names(property)

Out[55]:

['deleter', 'fdel', 'fget', 'fset', 'getter', 'setter']

In [56]:

desc_methods(Person.full_name)

__get__   : True
__set__   : True
__delete__: True

In [57]:

is_descriptor(Person.full_name), Person.full_name

Out[57]:

(True, <property at 0x1065cb790>)

When the descriptor is looked up as an instance attribute, however, the value returned is generated by calling the descriptor's __get__ method.

In [58]:

is_descriptor(me.full_name), me.full_name

Out[58]:

(False, 'Steve Holden')

Question: Why can't a class have more than one DN descriptor?¶

Answer: because all descriptor instances would try to use the same instance variable of the client class's instances.¶

Bonus material¶

Use this as the basis for further investigations. Here are descriptors I wrote for the ticketing project.

The main goal was canonicalisation: sometimes strings were being assigned and an integer was required, for example.

Several things are of potential interest.

When default values are provided they cannot be positional (That's the purpose of the * in the method signature).
The set_name method is used to determine the name bound to the descriptor in its client class.
This allows different instances of the same descriptor to coexist within a class definition, so you need not define each one as a distinct class.

In [59]:

class StringInt:
    def __init__(self, *, default=0):
        self._default = default
    def __set_name__(self, owner, name):
        self._name = "_" + name
    def __get__(self, obj, type):
        if obj is None:
            return self._default
        return getattr(obj, self._name, self._default)
    def __set__(self, obj, value):
        setattr(obj, self._name, int(value))

In [60]:

class StringBool:
    def __init__(self, *, default='FALSE'):
        self._default = default
    def __set_name__(self, owner, name):
        self._name = "_" + name
    def __get__(self, obj, type):
        if obj is None:
            return self._default
        return getattr(obj, self._name, self._default)
    def __set__(self, obj, value):
        setattr(obj, self._name, value in ["TRUE", "True", "1", 1, True])

Nick Fitzsimmons asked ...¶

A descriptor whose definition deleted itself (!).

In [61]:

class StringInt:
    def __init__(self, *, default=0):
        self._default = default
    def __set_name__(self, owner, name):
        del owner.__dict__[name]  # <<<<<<<<<<<<<<<<<<<
    def __get__(self, obj, type):
        if obj is None:
            return self._default
        return getattr(obj, self._name, self._default)
    def __set__(self, obj, value):
        setattr(obj, self._name, int(value))

In [62]:

try:
    class User:
        si: StringInt = StringInt()
except RuntimeError as e:
    print("RuntimeError:", e)

RuntimeError: Error calling __set_name__ on 'StringInt' instance 'si' in 'User'

The actual issue is that __set_name__ tries to modify the class __dict__, which you may remember is a (read-only) mappingproxy object.

Conclusion: not without getting really tricky!

Functions are descriptors¶

In [63]:

type(desc_methods)

Out[63]:

function

In [64]:

is_descriptor(desc_methods)

Out[64]:

True

In [65]:

desc_methods(desc_methods)

__get__   : True
__set__   : False
__delete__: False

Question¶

Why are functions descriptors? What advantages does this confer?

In [66]:

names(desc_methods)

Out[66]:

[]

In [67]:

desc_methods.__get__(desc_methods)()

__get__   : True
__set__   : False
__delete__: False

In [68]:

desc_methods.__class__.__dict__

Out[68]:

mappingproxy({'__new__': <function function.__new__(*args, **kwargs)>,
              '__repr__': <slot wrapper '__repr__' of 'function' objects>,
              '__call__': <slot wrapper '__call__' of 'function' objects>,
              '__get__': <slot wrapper '__get__' of 'function' objects>,
              '__closure__': <member '__closure__' of 'function' objects>,
              '__doc__': <member '__doc__' of 'function' objects>,
              '__globals__': <member '__globals__' of 'function' objects>,
              '__module__': <member '__module__' of 'function' objects>,
              '__builtins__': <member '__builtins__' of 'function' objects>,
              '__code__': <attribute '__code__' of 'function' objects>,
              '__defaults__': <attribute '__defaults__' of 'function' objects>,
              '__kwdefaults__': <attribute '__kwdefaults__' of 'function' objects>,
              '__annotations__': <attribute '__annotations__' of 'function' objects>,
              '__dict__': <attribute '__dict__' of 'function' objects>,
              '__name__': <attribute '__name__' of 'function' objects>,
              '__qualname__': <attribute '__qualname__' of 'function' objects>})

Here endeth the notebook¶

I hope this little tour through descriptors has not only explained an important Python mechanism, but also encouraged you to be adventurous in using notebooks or the interactive interpreter to explore Python's lesser-known corners.

February 29, 2024

Understanding Python Descriptors

Understanding Descriptors¶

Unqualified name access¶

Attribute access¶

The descriptor protocol¶

Extending a descriptor to assignment¶

Q: Why can't a class have more than one DN descriptor?¶

A: Because all DN descriptors save their state in the same `_dn` instance variable¶

Properties¶

Question: Why can't a class have more than one DN descriptor?¶

Answer: because all descriptor instances would try to use the same instance variable of the client class's instances.¶

Bonus material¶

Further reading¶

Nick Fitzsimmons asked ...¶

Functions are descriptors¶

Question¶

Here endeth the notebook¶

February 29, 2024

Understanding Python Descriptors

Understanding Descriptors¶

Unqualified name access¶

Attribute access¶

The descriptor protocol¶

Extending a descriptor to assignment¶

Q: Why can't a class have more than one DN descriptor?¶

A: Because all DN descriptors save their state in the same _dn instance variable¶

Properties¶

Question: Why can't a class have more than one DN descriptor?¶

Answer: because all descriptor instances would try to use the same instance variable of the client class's instances.¶

Bonus material¶

Further reading¶

Nick Fitzsimmons asked ...¶

Functions are descriptors¶

Question¶

Here endeth the notebook¶

A: Because all DN descriptors save their state in the same `_dn` instance variable¶