Two tips for more discoverable Python

When working on a new codebase, a learning curve is inevitable. However, steepness greatly varies. Here are two dead-simple ways to write discoverable Python code that smoothens this curve.

Type hint. Always.

Let's consider the following example function:

def do_things(things):
  # 100+ line function with statements like the following
  first_thing = things["first"]
  second_thing = things.get("second")

Soon arises the need to invoke the function do_things. However, there is a critical question whose answer takes far too long to find:

What is things?

Things is a dict. Or is it? It really is just a subscriptable Python object. So will any subscriptable object do? Type hints to the rescue! Also referred to as type annotations in Python, this is simply an indication to fellow engineers what data type(s) is(are) compatible. Here’s the first enhancement:

from typing import Dict

def do_things(things: Dict):
  # 100+ line function with statements like the following
  first_thing = things["first"]
  second_thing = things.get("second")

An important thing to note here is that, used in this way, type hinting is only about helping fellow engineers (and the IDEs that offer type-based auto-complete) understand a function. This includes its arguments and return types (not shown). Being a dynamically typed language, the Python interpreter won’t care if you’ve written code that deviates from the include type hints. This is true even if you eventually encounter a runtime error when, for example, trying to .lower() a variable whose value is an int at runtime.

For static type checking at compile time, you’d need something like mypy. For runtime type-checking, there is Pydantic.

Here are a few references for learning more about type hinting in Python:

For class attributes and function arguments, favor data classes over dicts

Let’s revisit the above example:

from typing import Dict

def do_things(things: Dict):
  # 100+ line function with statements like the following
  first_thing = things["first"]
  second_thing = things.get("second")

I argue that an equally or more critical question arises, even after knowing that things is a dict:

In the things dict, which entries are required, and which are optional?

To answer this question with the function in its current form, a couple options exist.

First option: Scan the function body

We can look for the keys that are directly subscripted with ["some_key"] (required) and which keys are retrieved via calls to .get("some_key"). The latter defaults to None when a key is not present, so they’re optional, right? Well, maybe. If they’re used in a function call made by this function, and that called function treats it as required, then the “optional” key is required after all. The glaring problem with this approach is that we’ve defeated the purpose of writing functions. The abstraction of granular operations that a function offers has been discarded by a need to read implementation details.

Second option: Add a docstring outlining the shape of the dict

For example:

 from typing import Dict

def do_things(things: Dict):
  """
  This function does some things.

  :param things: A dict as follows:
                 {
                   "first_thing": "some first thing",
                   "second_thing": "some optional second thing",
                 }
  """

Though this approach gives some insight into the function’s expectation for the argument, since the docstring doesn’t constitute executable code, it may be completely detached from implementation. For example, if inside the function body there is an unguarded reference to things["second_thing"], then you’ll have been misled.

There is a straightforward solution: use data classes. You can use Python’s native dataclass (since Python 3.7) or those offered from attrs. Here is the above example revisited with the built-in dataclass:

from typing import Optional
from dataclasses import dataclass

@dataclass
class ThingsToDo:
  first_thing: str
  second_thing: Optional[str] = None


def do_things(things: ThingsToDo):
  """
  Does the things specified
  :param things: Object encapsulating the things to do
  """
  # Do the things

The wonderfully helpful result of using this pattern is that another engineer (including your future self) can quickly answer the following questions:

  1. What is things?
  2. In things, what is required and what is optional?

Conclusion

Writing discoverable code will save you and fellow engineers countless hours hunting down implementation details. Because these are "operational" approaches that enable IDE auto-complete and static type checking, their value can be iteratively unlocked, first by smoothing the learning curve, and eventually by automating quality checks.