Descriptors: The Backbone of Everything?

If you’re anything like me, every day you use countless extraordinary items that, due to their prevalence and reliability, have become mundane. And because of this banality, you never stop to think about what’s actually happening under the hood, so to speak.

For example, starting simply, you turn a key in a lock and the door opens. But inside there are half a dozen rods that your key perfectly aligns to prevent would-be thieves from gaining access.

Another example would be when you’re on your way to work and stop at a traffic light. Someone has programmed this array of bulbs and wires to detect when cars are there, and if a pedestrian has pushed the crossing button. Everything is timed right so that everyone gets their turn. There are microcontrollers, inductive sensors, switches and buttons all over the place. And yet, for the end user, it’s boiled down to: “Green means go, red means stop, and yellow means speed up as long as you don’t spot any police.”

Cooperative Cogs
Elevators, wristwatches, automobiles… heck, go on YouTube and look up how they make tinned baked beans.

Not to mention the most obvious example: you’re almost certainly reading this very article on some kind of internet connected device. Millions or billions of bits of information just went hurtling under oceans and through optical cables and copper wires, and perhaps even through thin air, just to draw entertaining and informative pixels on your screen. And it (more or less) just works.

This post is about something whose complexity lies somewhere between locks and The Entire Internet. It’s a feature of Python that I (and probably many, many other developers) have used for a very long time, without ever bothering to figure out what was really going on.

If you’ve used something like Django or SQLAlchemy then you will probably have created a model or form class like this (Django example incoming):

class Person(Model):
    first_name = CharField()
    last_name = CharField()
    birth_date = DateField()

Then you know that once a person is loaded from the database, you can read and write their attributes like this:

>>> person = Person.objects.get(id=1)

>>> print(person.first_name)  # prints the first name
'Unix'
>>> person.birth_date = date(1970, 1, 1)  # set the birth date

Are you wondering now what I’m getting at? Everything looks normal, I know. But what if we look at the types of these attributes, e.g:

>>> type(person.first_name)
<class 'str'>

Again, you’re probably wondering what I’m getting at; this all seems normal, right? Then I have a question for you: why is the type of person.first_name not CharField?

If we had used a different class for the attribute, we’d see something like:

>>> class MyAttribute():
...     pass
...
>>> class MyClass():
...     a = MyAttribute()
...
>>> c = MyClass()
>>> type(c.a)
<class '__main__.MyAttribute'>

So Django et al. are obviously doing something behind the scenes to give us that value. I have to admit this is something I’ve glazed over on for about 13 years, and I just took this “magic” for granted. But I recently had reason to make use of these things, which are called descriptors, and thought I’d share what I discovered so they’re more widely known.

Descriptors

Now let’s have a look at the magic - luckily there’s not too much of it.

Descriptors are just normal classes, and there are three special methods that they can implement:

  • __get__: return the value for this property. This is a required method.
  • __set__: set the value for this property. This is optional.
  • __set_name__: set the name of the attribute. This is also optional.

We’re going to break down the features and virtues of all three methods, so that you know what you’re dealing with.

You Got This
There will not be a test, but we’re confident you can grok this stuff.

__get__ method

Let’s look at a descriptor in action. First, solely using the __get__ method, we’ll return a static value.

class StaticAttribute:
    def __get__(self, obj, objtype=None):
        return 5


class MyCoolClass:
    a = StaticAttribute()


mcc = MyCoolClass()
print(mcc.a)  # prints "5"

When we access the a attribute on mcc the __get__ method of StaticAttribute is called to return 5.

The arguments the __get__ method accepts are:

  • self: the descriptor instance; in this case, the StaticAttribute instance.
  • obj: the object to which the descriptor is attached; in our case it would be the MyCoolClass instance. This value will be None if you access the attribute directly on the class itself (by executing MyCoolClass.a, for example).
  • objtype: The class (type, not instance) to which the descriptor is attached.

Unattached Descriptors

As a quick aside, descriptors that aren’t attached to a class/instance behave as you might expect a normal class to:

a = StaticAttribute()
print(a)  # prints something like "<__main__.StaticAttribute object at 0x106e18eb0>"

__set__ method

When we have not defined a __set__ method, you can replace the attribute on the object with a different value, as you normally would. For example:

c = MyCoolClass()
c.a = 10  # the attribute gets replaced
print(c.a)  # now prints 10

We can prevent the attribute from being overwritten by implementing the __set__ method, which will be called when the attribute is set instead. We can make our StaticAttribute read-only by implementing a __set__ method that does nothing.

class StaticAttribute:
    def __get__(self, obj, objtype=None):
        return 5

    def __set__(self, obj, value):
        return

c = MyCoolClass()
c.a = 10  # __set__ is called, which does nothing
print(c.a)  # still prints 5

The __set__ method accepts self and obj as arguments, the same as __get__. And as you might expect, value is the value that is being set.

Here’s another basic example, a descriptor that only allows you to write a value once:

class WriteOnce:
    value_written = False
    value = None

    def __get__(self, obj, objtype=None):
        if not self.value_written:
            raise AttributeError("Value not written yet.")
        return self.value

    def __set__(self, obj, value):
        if self.value_written:
            raise AttributeError("Value already written.")
        self.value = value
        self.value_written = True


class Worm:
    # write-once, read many
    a = WriteOnce()
    b = WriteOnce()

w = Worm()

w.a = "a value"
print(w.a)  # print "a value"

# raise AttributeError
# due to it being set again
w.a = "another value"

# also raise AttributeError
# as value hasn't been set yet
print(w.b)

Note that __set__ is not called if setting a descriptor value on a class (not instance):

MyCoolClass.a = 10
print(MyCoolClass.a)  # prints 10

__set_name__ method

Rounding out the trifecta of methods is __set_name__, which is called to tell your descriptor what attribute name it’s been giving on the class it’s attached to.

We’ll look at a real-world example soon, but first let’s just use it to add some basic logging around attribute access.

class LoggingAttribute:
    value = None

    def __get__(self, obj, objtype=None):
        print(f"Accessing attribute {self.name}")
        return self.value

    def __set__(self, obj, value):
        print(f"Setting attribute {self.name}")
        self.value = value

    def __set_name__(self, owner, name):
        self.name = name

class BasicClass:
    a = LoggingAttribute()
    b = LoggingAttribute()


bc = BasicClass()
bc.a = "some value"  # prints "Setting attribute a"

print(bc.b)  # prints "Accessing attribute b", then prints "None"
# (the value of b) on a new line

The __set_name__ method is called when the class is parsed, so that once you set the name, it’s always available when you access the attribute (i.e. you don’t have to set it to a default value and then check that it’s not None).

Real World Descriptors

I started off this post by talking about Django and SqlAlchemy, as two examples of descriptor usage. You can’t get much more real-world than that! But here’s another example.

Let’s say you have some data in a database, REST service, or some other data store. You want to have a adapter/proxy class that wraps this received data so that only some of it is “publicly” visible. (Yes, I know, public/private accessors aren’t really a thing in Python.)

In our example we’ll go with an object from a REST service that’s in the form of a dictionary. This raw data we receive will contain a user_id, user_name and password_hash. The user_id is readable, the user_name is readable and writable, and the password_hash is neither readable nor writable.

Admittedly, we could achieve this kind of “”protection”” (in air quotes since nothing in Python is really private) by making use of the __getattr__ and __setattr__ methods in Python. Something like this:

class UserAdapter:
    def __init__(self, data):
        # data is dict from REST service
        self.data = data

    def __getattr__(self, name):
        if name in {"user_id", "user_name"}:
            return self.data[name]

        raise AttributeError(f"{name} is not readable.")

    def __setattr__(self, name, value):
        # special case for setting the underlying dict
        if name == "data":
            super().__setattr__(name, value)
            return

        if name == "username":
            self.data[name] = value
        else:
            raise AttributeError(f"{name} is not writable.")

And it would be used something like this:

# this comes from REST
raw_data = {
    "user_id": 1,
    "user_name": "billg",
    "password_hash": "5f4dcc3b5aa765d61d8327deb882cf99"
}

ua = UserAdapter(raw_data)
print(ua.user_id)   # print 1
print(ua.password_hash)  # raise AttributeError
ua.user_id = 2  # raise AttributeError

This approach works, but it’s not ideal:

  • Hacks in __setattrr__: we already have one special case for data. If we decide that data should be renamed, we have to remember to change it in here as well. And this is a very simple class; you can see it getting worse as it gets more complex. Maybe we'd need to implement an is_cached attribute or something like that, which also would need to be handled.
  • We don't get autocomplete. Our IDE doesn't know what attributes are available; although most IDEs would be able to guess once we've used the attribute once, the word guess applies here, and the guesses usually are not that great. In fact, once an IDE sees a __getattr__ or __getattribute__ method, it won't complain about any attribute you try to access, as it assumes you're handling them all dynamically.
  • No type hints! Since there's only one attribute-getting method being called for each attribute, we can't annotate it with the right type for each attribute.
  • It just feels too "dynamic". We know Python is a dynamic language, but I think that sometimes this principle is taken too far. We could make it a bit better by having readable_fields and writable_fields as attributes that we refer to, but again we'd have to handle them specially in __setattr__.

At the end of the day, while these methods work, they don’t work properly or efficiently. They are, in most cases, a patch-job used to bypass a problem, as opposed to an approach dedicated to preventing the problem from arising in the first place.

Patched Jeans
Yes, some people do this on purpose, but we’re not aiming to make a fashion statement.

Using descriptors solves all these problems

Let’s redo this problem in a nicely type-hinted way, using descriptors. The RestFieldProxy class below you should be able to understand. It will be used to access values from the data dictionary on the UserAdapter to which it is attached. The data it accesses will be based on its name.

We’ll also define an __init__ method that allows us to declare the field writable or non-writable:

import typing


class RestFieldProxy:
    name: str
    writable: bool

    def __init__(self, writable: bool = False) -> None:
        self.writable = writable

    def __get__(self, obj, objtype=None) -> typing.Any:
        return obj.data[self.name]

    def __set__(self, obj, value) -> None:
        if not self.writable:
            raise AttributeError(f"{self.name} is not writable.")
        obj.data[self.name] = value

    def __set_name__(self, owner, name) -> None:
        self.name = name

Now let’s use it and re-implement UserAdapter:

class UserAdapter:
    data: dict

    user_id: int = RestFieldProxy()
    user_name: str = RestFieldProxy(writable=True)

    def __init__(self, data) -> None:
        # data is dict from REST service
        self.data = data

And we can use it the same way:

ua = UserAdapter(raw_data)
print(ua.user_id)   # print 1
print(ua.password_hash)  # raise AttributeError
ua.user_id = 2  # raise AttributeError

What have we gained by doing it this way?

  • It's more obvious, at a glance, what fields are available on UserAdapter. We can even see which ones are writable just as easily.
  • No hacks or special cases need to be applied in __getattr__ or __setattr__.
  • Someone using our code doesn't need to inspect the implementation of UserAdapter to work out how to access the fields.
  • Our IDE can now reliably autocomplete entries, plus automatically throw errors for fields that aren't filled.
  • Type hinting is now built-in.
  • Python automatically handles AttributeErrors for missing fields for us – of course, since password_hash doesn't exist on the class, we see it nowhere in the definition.

This solution is more elegant, readable, and understandable, provided you know about descriptors – which you do, since you’ve just read this post!

Better Patches
Then again, when it comes to making fashion statements, putting in the time and effort to do it right is often worth it!

What else can descriptors do?

Once again, we can peek at other existing descriptor users (*cough* Django) to get some idea.

  • You can add validation options so that bad values are rejected when __set__ is called.
  • The descriptors could automatically update a dirty flag on the object, so you know that it needs to be saved.
  • Maybe when an object is serialized, you could only serialize the fields with a serialize argument.

There are plenty of options out there, and I think once you start using descriptors you can start limiting the hacks and workarounds that normally have to be implemented with __getattr__ and __setattr__.

Conclusion

Writing a conclusion is sometimes the hardest part of the post: writing something interesting without being an exact copy of the week before. In the end though I think it comes down to a simple formula:

  1. Use descriptors, they're awesome.
  2. I have a mailing list, sign up below to be notified when new posts go up.
  3. At Tera Shift, we write good clean code. Please contact use for advice on clean code, deployment and infrastructure or managing development teams.

About Tera Shift

Tera Shift Ltd is a software and data consultancy. We help companies with solutions for development, data services, analytics, project management, and more. Our services include:

  • Working with companies to build best-practice teams
  • System design and implementation
  • Data management, sourcing, ETL and storage
  • Bespoke development
  • Process automation

We can also advise on how custom solutions can help your business grow, by using your data in ways you hadn’t thought possible.

About the author

Ben Shaw (B. Eng) is the Director of Tera Shift Ltd. He has over 15 years’ experience in Software Engineering, across a range of industries. He has consulted for companies ranging in size from startups to major enterprises, including some of New Zealand’s largest household names.

Email ben@terashift.co.nz