If you’re anything like me, every day you use countless extraordinary items that, due to their prevalence and reliability, have become mundane. And because of this banality, you never stop to think about what’s actually happening under the hood, so to speak.
For example, starting simply, you turn a key in a lock and the door opens. But inside there are half a dozen rods that your key perfectly aligns to prevent would-be thieves from gaining access.
Another example would be when you’re on your way to work and stop at a traffic light. Someone has programmed this array of bulbs and wires to detect when cars are there, and if a pedestrian has pushed the crossing button. Everything is timed right so that everyone gets their turn. There are microcontrollers, inductive sensors, switches and buttons all over the place. And yet, for the end user, it’s boiled down to: “Green means go, red means stop, and yellow means speed up as long as you don’t spot any police.”
Not to mention the most obvious example: you’re almost certainly reading this very article on some kind of internet connected device. Millions or billions of bits of information just went hurtling under oceans and through optical cables and copper wires, and perhaps even through thin air, just to draw entertaining and informative pixels on your screen. And it (more or less) just works.
This post is about something whose complexity lies somewhere between locks and The Entire Internet. It’s a feature of Python that I (and probably many, many other developers) have used for a very long time, without ever bothering to figure out what was really going on.
If you’ve used something like Django or SQLAlchemy then you will probably have created a model or form class like this (Django example incoming):
class Person(Model):
first_name = CharField()
last_name = CharField()
birth_date = DateField()
Then you know that once a person is loaded from the database, you can read and write their attributes like this:
>>> person = Person.objects.get(id=1)
>>> print(person.first_name) # prints the first name
'Unix'
>>> person.birth_date = date(1970, 1, 1) # set the birth date
Are you wondering now what I’m getting at? Everything looks normal, I know. But what if we look at the types of these attributes, e.g:
>>> type(person.first_name)
<class 'str'>
Again, you’re probably wondering what I’m getting at; this all seems normal, right? Then I have a question for you: why is the type
of person.first_name
not CharField
?
If we had used a different class for the attribute, we’d see something like:
>>> class MyAttribute():
... pass
...
>>> class MyClass():
... a = MyAttribute()
...
>>> c = MyClass()
>>> type(c.a)
<class '__main__.MyAttribute'>
So Django et al. are obviously doing something behind the scenes to give us that value. I have to admit this is something I’ve glazed over on for about 13 years, and I just took this “magic” for granted. But I recently had reason to make use of these things, which are called descriptors, and thought I’d share what I discovered so they’re more widely known.
Descriptors
Now let’s have a look at the magic - luckily there’s not too much of it.
Descriptors are just normal classes, and there are three special methods that they can implement:
__get__
: return the value for this property. This is a required method.__set__
: set the value for this property. This is optional.__set_name__
: set the name of the attribute. This is also optional.
We’re going to break down the features and virtues of all three methods, so that you know what you’re dealing with.
__get__
method
Let’s look at a descriptor in action. First, solely using the __get__
method, we’ll return a static value.
class StaticAttribute:
def __get__(self, obj, objtype=None):
return 5
class MyCoolClass:
a = StaticAttribute()
mcc = MyCoolClass()
print(mcc.a) # prints "5"
When we access the a
attribute on mcc
the __get__
method of StaticAttribute
is called to return 5
.
The arguments the __get__
method accepts are:
self
: the descriptor instance; in this case, theStaticAttribute
instance.obj
: the object to which the descriptor is attached; in our case it would be theMyCoolClass
instance. This value will beNone
if you access the attribute directly on the class itself (by executingMyCoolClass.a
, for example).objtype
: The class (type, not instance) to which the descriptor is attached.
Unattached Descriptors
As a quick aside, descriptors that aren’t attached to a class/instance behave as you might expect a normal class to:
a = StaticAttribute()
print(a) # prints something like "<__main__.StaticAttribute object at 0x106e18eb0>"
__set__
method
When we have not defined a __set__
method, you can replace the attribute on the object with a different value, as you normally would. For example:
c = MyCoolClass()
c.a = 10 # the attribute gets replaced
print(c.a) # now prints 10
We can prevent the attribute from being overwritten by implementing the __set__
method, which will be called when the attribute is set instead. We can make our StaticAttribute
read-only by implementing a __set__
method that does nothing.
class StaticAttribute:
def __get__(self, obj, objtype=None):
return 5
def __set__(self, obj, value):
return
c = MyCoolClass()
c.a = 10 # __set__ is called, which does nothing
print(c.a) # still prints 5
The __set__
method accepts self
and obj
as arguments, the same as __get__
. And as you might expect, value
is the value that is being set.
Here’s another basic example, a descriptor that only allows you to write a value once:
class WriteOnce:
value_written = False
value = None
def __get__(self, obj, objtype=None):
if not self.value_written:
raise AttributeError("Value not written yet.")
return self.value
def __set__(self, obj, value):
if self.value_written:
raise AttributeError("Value already written.")
self.value = value
self.value_written = True
class Worm:
# write-once, read many
a = WriteOnce()
b = WriteOnce()
w = Worm()
w.a = "a value"
print(w.a) # print "a value"
# raise AttributeError
# due to it being set again
w.a = "another value"
# also raise AttributeError
# as value hasn't been set yet
print(w.b)
Note that __set__
is not called if setting a descriptor value on a class (not instance):
MyCoolClass.a = 10
print(MyCoolClass.a) # prints 10
__set_name__
method
Rounding out the trifecta of methods is __set_name__
, which is called to tell your descriptor what attribute name it’s been giving on the class it’s attached to.
We’ll look at a real-world example soon, but first let’s just use it to add some basic logging around attribute access.
class LoggingAttribute:
value = None
def __get__(self, obj, objtype=None):
print(f"Accessing attribute {self.name}")
return self.value
def __set__(self, obj, value):
print(f"Setting attribute {self.name}")
self.value = value
def __set_name__(self, owner, name):
self.name = name
class BasicClass:
a = LoggingAttribute()
b = LoggingAttribute()
bc = BasicClass()
bc.a = "some value" # prints "Setting attribute a"
print(bc.b) # prints "Accessing attribute b", then prints "None"
# (the value of b) on a new line
The __set_name__
method is called when the class is parsed, so that once you set the name, it’s always available when you access the attribute (i.e. you don’t have to set it to a default value and then check that it’s not None
).
Real World Descriptors
I started off this post by talking about Django and SqlAlchemy, as two examples of descriptor usage. You can’t get much more real-world than that! But here’s another example.
Let’s say you have some data in a database, REST service, or some other data store. You want to have a adapter/proxy class that wraps this received data so that only some of it is “publicly” visible. (Yes, I know, public/private accessors aren’t really a thing in Python.)
In our example we’ll go with an object from a REST service that’s in the form of a dictionary. This raw data we receive will contain a user_id
, user_name
and password_hash
. The user_id
is readable, the user_name
is readable and writable, and the password_hash
is neither readable nor writable.
Admittedly, we could achieve this kind of “”protection”” (in air quotes since nothing in Python is really private) by making use of the __getattr__
and __setattr__
methods in Python. Something like this:
class UserAdapter:
def __init__(self, data):
# data is dict from REST service
self.data = data
def __getattr__(self, name):
if name in {"user_id", "user_name"}:
return self.data[name]
raise AttributeError(f"{name} is not readable.")
def __setattr__(self, name, value):
# special case for setting the underlying dict
if name == "data":
super().__setattr__(name, value)
return
if name == "username":
self.data[name] = value
else:
raise AttributeError(f"{name} is not writable.")
And it would be used something like this:
# this comes from REST
raw_data = {
"user_id": 1,
"user_name": "billg",
"password_hash": "5f4dcc3b5aa765d61d8327deb882cf99"
}
ua = UserAdapter(raw_data)
print(ua.user_id) # print 1
print(ua.password_hash) # raise AttributeError
ua.user_id = 2 # raise AttributeError
This approach works, but it’s not ideal:
- Hacks in
__setattrr__
: we already have one special case fordata
. If we decide thatdata
should be renamed, we have to remember to change it in here as well. And this is a very simple class; you can see it getting worse as it gets more complex. Maybe we'd need to implement anis_cached
attribute or something like that, which also would need to be handled. - We don't get autocomplete. Our IDE doesn't know what attributes are available; although most IDEs would be able to guess once we've used the attribute once, the word guess applies here, and the guesses usually are not that great. In fact, once an IDE sees a
__getattr__
or__getattribute__
method, it won't complain about any attribute you try to access, as it assumes you're handling them all dynamically. - No type hints! Since there's only one attribute-getting method being called for each attribute, we can't annotate it with the right type for each attribute.
- It just feels too "dynamic". We know Python is a dynamic language, but I think that sometimes this principle is taken too far. We could make it a bit better by having
readable_fields
andwritable_fields
as attributes that we refer to, but again we'd have to handle them specially in__setattr__
.
At the end of the day, while these methods work, they don’t work properly or efficiently. They are, in most cases, a patch-job used to bypass a problem, as opposed to an approach dedicated to preventing the problem from arising in the first place.
Using descriptors solves all these problems
Let’s redo this problem in a nicely type-hinted way, using descriptors. The RestFieldProxy
class below you should be able to understand. It will be used to access values from the data
dictionary on the UserAdapter
to which it is attached. The data it accesses will be based on its name
.
We’ll also define an __init__
method that allows us to declare the field writable or non-writable:
import typing
class RestFieldProxy:
name: str
writable: bool
def __init__(self, writable: bool = False) -> None:
self.writable = writable
def __get__(self, obj, objtype=None) -> typing.Any:
return obj.data[self.name]
def __set__(self, obj, value) -> None:
if not self.writable:
raise AttributeError(f"{self.name} is not writable.")
obj.data[self.name] = value
def __set_name__(self, owner, name) -> None:
self.name = name
Now let’s use it and re-implement UserAdapter
:
class UserAdapter:
data: dict
user_id: int = RestFieldProxy()
user_name: str = RestFieldProxy(writable=True)
def __init__(self, data) -> None:
# data is dict from REST service
self.data = data
And we can use it the same way:
ua = UserAdapter(raw_data)
print(ua.user_id) # print 1
print(ua.password_hash) # raise AttributeError
ua.user_id = 2 # raise AttributeError
What have we gained by doing it this way?
- It's more obvious, at a glance, what fields are available on
UserAdapter
. We can even see which ones are writable just as easily. - No hacks or special cases need to be applied in
__getattr__
or__setattr__
. - Someone using our code doesn't need to inspect the implementation of
UserAdapter
to work out how to access the fields. - Our IDE can now reliably autocomplete entries, plus automatically throw errors for fields that aren't filled.
- Type hinting is now built-in.
- Python automatically handles
AttributeError
s for missing fields for us – of course, sincepassword_hash
doesn't exist on the class, we see it nowhere in the definition.
This solution is more elegant, readable, and understandable, provided you know about descriptors – which you do, since you’ve just read this post!
What else can descriptors do?
Once again, we can peek at other existing descriptor users (*cough* Django) to get some idea.
- You can add validation options so that bad values are rejected when
__set__
is called. - The descriptors could automatically update a
dirty
flag on the object, so you know that it needs to be saved. - Maybe when an object is serialized, you could only serialize the fields with a
serialize
argument.
There are plenty of options out there, and I think once you start using descriptors you can start limiting the hacks and workarounds that normally have to be implemented with __getattr__
and __setattr__
.
Conclusion
Writing a conclusion is sometimes the hardest part of the post: writing something interesting without being an exact copy of the week before. In the end though I think it comes down to a simple formula:
- Use descriptors, they're awesome.
- I have a mailing list, sign up below to be notified when new posts go up.
- At Tera Shift, we write good clean code. Please contact use for advice on clean code, deployment and infrastructure or managing development teams.