It’s said that 93% of communication is non-verbal. This is probably just an urban legend, but when we’re talking with someone in person, there’s obviously information being transmitted beyond the words they use; we also pick up extra “clues” about their intent from tone of voice, posture and many other indicators.
When it comes to writing code, we can imply certain extra “clues” by using the right variable types. If done right, the next person to come along and read your code can infer extra information about your intent, by looking at the types being used – and if you want extra reasons to do this, remember that that next person could be you!
Types: Simple vs Complex
Sometimes the choice of variable data type can be simple and obvious. If you want to store a count of items, you’d probably want to assign an int
. For storing some fractional value, assign a float
or Decimal
. Lists of unicode characters should be stored in strings (str
), and for raw data, the simplest way to store variables is by using bytes
.
When bringing collection types into use, things get more complicated. Python includes a range of collection types, each of which is most useful in a certain subset of situations. The most commonly used ones are:
list
set
tuple
andNamedTuple
dict
The use case for dict
(dictionary) data types is quite clear-cut, so I won’t write about them in this post. When it comes to the other three, their similarity can be confusing, especially for new developers or those coming from other languages. With this vast array (pun slightly intended) of types to choose from, which should you pick? And why should you care?
The “it just works” problem
The “problem” that we have is that there are similar interfaces for accessing members of collections. For example, let’s define and print a collection of values using a list
, tuple
and a set
. First a list
:
my_list = [1, 2, 3]
for n in my_list:
print(n) # prints 1, 2, 3
Then a tuple
:
my_tuple = (1, 2, 3)
for n in my_tuple:
print(n) # prints 1, 2, 3
Finally, a set
:
my_set = = {1, 2, 3}
for n in my_set:
print(n) # prints 1, 2, 3, or maybe 3, 2, 1, etc
The last example is not exactly the same: sets are unordered and thus you could get the numbers printed in any order. The methods and interface for iterating, however, are all the same.
Furthermore, list
s can do the jobs of, or be made to behave like, the other two data types. So why should we settle with their limitations?
What’s the difference?
Often list
s are the go-to for storing collections of items, and those coming from Javascript-land will recognise the syntax for their creation. tuple
s share a lot of properties with list
s, and are fairly recognisable to devs coming from C# or Java.
Let’s compare list
s and tuple
s:
list s |
tuple s |
---|---|
Mutable; can add and remove items at will | Immutable; once created, that's it! |
Sortable; items can be rearranged | Unsortable; can't be reordered once created |
Indexable; can access any item by index | Also indexable |
Omni-typable; can store items of any type | Also omni-typable |
Repeatable; can store the same item or value twice | Also repeatable |
So tuple
s are basically immutable list
s? Yes and no – we’ll come back to them soon.
As for set
s?
- They're mutable!
- They have no order, so they can't be rearranged.
- You can't access items by index, since they have no order.
- You can store items of any type.
- But you can't store the same item/value more than once.
So set
s are basically lists
with no order? No. They’re similar in the same way as a fish and an aeroplane — both of them move and tend to be shiny, but that’s as far as the similarities go.
Please note that the following recommendations are my own opinions backed by my 15 years in the industry. You can still use any of the collection types as you see fit, or just keep using
list
s for everything.
Why use tuple
s?
As we mentioned, tuple
s can store different types – but my rule of thumb is to use a tuple
to keep related types together. Especially when you want to make sure they won’t change between the time they’re created and the time they’re used.
For example, suppose there’s a function that fetches a user’s first and last names from the database, given their user ID.
users_name = get_user_by_id(1)
# users_name could be something like ("Firstname", "Lastname").
Here, we want to indicate the values are related – first and last name definitely are – so we group them in a tuple
. And given that we wouldn’t want someone to change our carefully computed result, having immutability is a great feature. Besides, it wouldn’t make sense to append another name at the end of our tuple; name changes are a sufficiently rare event that it makes more sense to limit any such actions to completely replacing the tuple, rather than making adjustments in situ. (Although there are exceptions…)
tuple
s are also great for passing related arguments to a method, especially if those arguments need to cascade down through other methods. If you add or remove an argument, just update the tuple
; no need to change all the method signatures. (Although for that, you should consider using a NamedTuple
. I think they rock, and wrote a whole post about them already.)
Why use set
s?
Do you like saving one line of code, every so often? Then use set
s!
A code sample is worth 1,000 words, so with a list
:
books = []
if book not in books:
books.append(book)
And behold the set
!
books = set() # an empty set can't be created by using {} as
# this would be interpreted as a dict
books.add(book) # look ma, no checking for existence!
There you go, one line saved. set
s are demonstrably superior.
If you’re looking for a more practical benefit, use set
s when order doesn’t matter, and uniqueness does. Let’s go back to the books
example. Obviously you don’t want to recommend the same book twice, so enforcing uniqueness with set
makes that little checksum built-in; and while you might not care about the order in which they’re read, having them in a list
would imply such an order, even if one isn’t meant to be there.
Most importantly, though, using set
s emphasizes to the code’s reader that order doesn’t matter, and uniqueness does.
That brings me back to list
s.
Aren’t lists less restrictive and therefore better?
You could make an argument for this. I can represent a tuple
as a list
and get mutability. We can add some checking for duplicates when adding to a list and get set
behaviour. So why not use list
s everywhere?
Well, let’s turn that question on its head: why not use tuple
s and set
s where they’re meant to be used? Code-wise, there are almost never any actual advantages to using a list that’s been adjusted to act like a tuple or a set, as opposed to just, you know, using a tuple or a set. The most common “advantage” is that it gives the developer two less things to remember, which… doesn’t really seem like an advantage, so much as it does laziness.
Conversely, using one of these “restrictive” data types can tell the reader more about the data and allow them to make inferences. When I see a list
I can infer:
- This data's order might be important.
- There might be an unknown number of these types of data.
- This set of data can be added to later.
- If I see the same value in here twice, that's not a red flag. Maybe just an orange one.
Upon seeing a tuple
, I know:
- This tuple represents a "thing", which is a composite of the data it contains.
- The number of elements is important – the data might not make sense with more or less elements.
- The immutability is important. Perhaps because it's the result of some operation, or some underlying function will break if changed after the
tuple
's been created.
And finally, a set
tells me:
- These elements are unique.
- Their order doesn't matter.
- They are related in some way, but together they don't represent one big composite value.
The big advantage here is that there’s no longer any need to leave comments on what you were thinking when you wrote these methods; your code speaks for itself. There’s also no need to interpret the adjustments that were made to the list
s and try to figure out what the writer was thinking.
Admittedly, these are just rules of thumb. There are always exceptions; sometimes you’ll run into situations that fall outside the box. Let’s say that you need to store data in a way that’s indexable and mutable, but non-sortable. This rare set of attributes means that you can’t use tuple
s or set
s, which means that there’s no choice but to use a limited list
.
The best way to figure out whether a developer is overusing list
s is to see whether the other two data types are used at all in their code. If you see a see a list
that’s acting like a set
or tuple
, in a Python app that makes use of set
s and tuple
s elsewhere in the code, there might be a reason for that, and you might want to dig deeper with the original developer and see what that reason is. On the other hand, if you see a program that’s full of adjusted list
s and no tuple
s or set
s, it’s likely that the developer just didn’t want to use those data types for whatever reason.
OK, boo, down with lists?
No, not quite. Just use the right type for the job. Yes, you can get away with using list
s all the time, if you want to be confusing. But you’ll end up surprising your fellow devs (or yourself), and speaking as a dev, I hate surprises.
Instead, think about what your types are saying about the data being represented. You’ll be able to tell future readers a story before they even get to looking at what the code is actually “doing”.