Listing the Problems of Lists

It’s said that 93% of communication is non-verbal. This is probably just an urban legend, but when we’re talking with someone in person, there’s obviously information being transmitted beyond the words they use; we also pick up extra “clues” about their intent from tone of voice, posture and many other indicators.

When it comes to writing code, we can imply certain extra “clues” by using the right variable types. If done right, the next person to come along and read your code can infer extra information about your intent, by looking at the types being used – and if you want extra reasons to do this, remember that that next person could be you!

Computer Magnifier
Navigating a mishmash of code can be tricky if you don’t leave a train of thought for yourself.

Types: Simple vs Complex

Sometimes the choice of variable data type can be simple and obvious. If you want to store a count of items, you’d probably want to assign an int. For storing some fractional value, assign a float or Decimal . Lists of unicode characters should be stored in strings (str), and for raw data, the simplest way to store variables is by using bytes.

When bringing collection types into use, things get more complicated. Python includes a range of collection types, each of which is most useful in a certain subset of situations. The most commonly used ones are:

  • list
  • set
  • tuple and NamedTuple
  • dict

The use case for dict (dictionary) data types is quite clear-cut, so I won’t write about them in this post. When it comes to the other three, their similarity can be confusing, especially for new developers or those coming from other languages. With this vast array (pun slightly intended) of types to choose from, which should you pick? And why should you care?

The “it just works” problem

The “problem” that we have is that there are similar interfaces for accessing members of collections. For example, let’s define and print a collection of values using a list, tuple and a set. First a list:

my_list = [1, 2, 3]

for n in my_list:
    print(n)  # prints 1, 2, 3

Then a tuple:

my_tuple = (1, 2, 3)

for n in my_tuple:
    print(n)  # prints 1, 2, 3

Finally, a set:

my_set = = {1, 2, 3}

for n in my_set:
    print(n)  # prints 1, 2, 3, or maybe 3, 2, 1, etc

The last example is not exactly the same: sets are unordered and thus you could get the numbers printed in any order. The methods and interface for iterating, however, are all the same.

Furthermore, lists can do the jobs of, or be made to behave like, the other two data types. So why should we settle with their limitations?

What’s the difference?

Triplet Chocolates
One’s just as good as another, right?

Often lists are the go-to for storing collections of items, and those coming from Javascript-land will recognise the syntax for their creation. tuples share a lot of properties with lists, and are fairly recognisable to devs coming from C# or Java.

Let’s compare lists and tuples:

lists tuples
Mutable; can add and remove items at will Immutable; once created, that's it!
Sortable; items can be rearranged Unsortable; can't be reordered once created
Indexable; can access any item by index Also indexable
Omni-typable; can store items of any type Also omni-typable
Repeatable; can store the same item or value twice Also repeatable

So tuples are basically immutable lists? Yes and no – we’ll come back to them soon.

As for sets?

  • They're mutable!
  • They have no order, so they can't be rearranged.
  • You can't access items by index, since they have no order.
  • You can store items of any type.
  • But you can't store the same item/value more than once.

So sets are basically lists with no order? No. They’re similar in the same way as a fish and an aeroplane — both of them move and tend to be shiny, but that’s as far as the similarities go.

Please note that the following recommendations are my own opinions backed by my 15 years in the industry. You can still use any of the collection types as you see fit, or just keep using lists for everything.

Why use tuples?

As we mentioned, tuples can store different types – but my rule of thumb is to use a tuple to keep related types together. Especially when you want to make sure they won’t change between the time they’re created and the time they’re used.

For example, suppose there’s a function that fetches a user’s first and last names from the database, given their user ID.

users_name = get_user_by_id(1)
# users_name could be something like ("Firstname", "Lastname").

Here, we want to indicate the values are related – first and last name definitely are – so we group them in a tuple. And given that we wouldn’t want someone to change our carefully computed result, having immutability is a great feature. Besides, it wouldn’t make sense to append another name at the end of our tuple; name changes are a sufficiently rare event that it makes more sense to limit any such actions to completely replacing the tuple, rather than making adjustments in situ. (Although there are exceptions…)

tuples are also great for passing related arguments to a method, especially if those arguments need to cascade down through other methods. If you add or remove an argument, just update the tuple; no need to change all the method signatures. (Although for that, you should consider using a NamedTuple. I think they rock, and wrote a whole post about them already.)

Why use sets?

Do you like saving one line of code, every so often? Then use sets!

A code sample is worth 1,000 words, so with a list:

books = []

if book not in books:
    books.append(book)

And behold the set!

books = set()  # an empty set can't be created by using {} as
# this would be interpreted as a dict

books.add(book)  # look ma, no checking for existence!

There you go, one line saved. sets are demonstrably superior.

If you’re looking for a more practical benefit, use sets when order doesn’t matter, and uniqueness does. Let’s go back to the books example. Obviously you don’t want to recommend the same book twice, so enforcing uniqueness with set makes that little checksum built-in; and while you might not care about the order in which they’re read, having them in a list would imply such an order, even if one isn’t meant to be there.

Most importantly, though, using sets emphasizes to the code’s reader that order doesn’t matter, and uniqueness does.

That brings me back to lists.

Aren’t lists less restrictive and therefore better?

Jumping
Ain’t nothing holding me down!

You could make an argument for this. I can represent a tuple as a list and get mutability. We can add some checking for duplicates when adding to a list and get set behaviour. So why not use lists everywhere?

Well, let’s turn that question on its head: why not use tuples and sets where they’re meant to be used? Code-wise, there are almost never any actual advantages to using a list that’s been adjusted to act like a tuple or a set, as opposed to just, you know, using a tuple or a set. The most common “advantage” is that it gives the developer two less things to remember, which… doesn’t really seem like an advantage, so much as it does laziness.

Conversely, using one of these “restrictive” data types can tell the reader more about the data and allow them to make inferences. When I see a list I can infer:

  • This data's order might be important.
  • There might be an unknown number of these types of data.
  • This set of data can be added to later.
  • If I see the same value in here twice, that's not a red flag. Maybe just an orange one.

Upon seeing a tuple, I know:

  • This tuple represents a "thing", which is a composite of the data it contains.
  • The number of elements is important – the data might not make sense with more or less elements.
  • The immutability is important. Perhaps because it's the result of some operation, or some underlying function will break if changed after the tuple's been created.

And finally, a set tells me:

  • These elements are unique.
  • Their order doesn't matter.
  • They are related in some way, but together they don't represent one big composite value.

The big advantage here is that there’s no longer any need to leave comments on what you were thinking when you wrote these methods; your code speaks for itself. There’s also no need to interpret the adjustments that were made to the lists and try to figure out what the writer was thinking.

Admittedly, these are just rules of thumb. There are always exceptions; sometimes you’ll run into situations that fall outside the box. Let’s say that you need to store data in a way that’s indexable and mutable, but non-sortable. This rare set of attributes means that you can’t use tuples or sets, which means that there’s no choice but to use a limited list.

The best way to figure out whether a developer is overusing lists is to see whether the other two data types are used at all in their code. If you see a see a list that’s acting like a set or tuple, in a Python app that makes use of sets and tuples elsewhere in the code, there might be a reason for that, and you might want to dig deeper with the original developer and see what that reason is. On the other hand, if you see a program that’s full of adjusted lists and no tuples or sets, it’s likely that the developer just didn’t want to use those data types for whatever reason.

OK, boo, down with lists?

No, not quite. Just use the right type for the job. Yes, you can get away with using lists all the time, if you want to be confusing. But you’ll end up surprising your fellow devs (or yourself), and speaking as a dev, I hate surprises.

Instead, think about what your types are saying about the data being represented. You’ll be able to tell future readers a story before they even get to looking at what the code is actually “doing”.

About Tera Shift

Tera Shift Ltd is a software and data consultancy. We help companies with solutions for development, data services, analytics, project management, and more. Our services include:

  • Working with companies to build best-practice teams
  • System design and implementation
  • Data management, sourcing, ETL and storage
  • Bespoke development
  • Process automation

We can also advise on how custom solutions can help your business grow, by using your data in ways you hadn’t thought possible.

About the author

Ben Shaw (B. Eng) is the Director of Tera Shift Ltd. He has over 15 years’ experience in Software Engineering, across a range of industries. He has consulted for companies ranging in size from startups to major enterprises, including some of New Zealand’s largest household names.

Email ben@terashift.co.nz