Dataclasses are a great way to eliminate repetitive boilerplate code when designing simple data-driven classes in Python. They provide an easy way to create multiple special methods that are seen in most Python classes. This article highlights some features of dataclasses that I feel are most important, but not all features are included here. Please see the official documentation above for a full official module reference.

Why Are Dataclasses Useful?

Dataclasses provide a way to create a more batteries included type of class in Python. Take any project and quickly glance over it for any code that looks like this,

class Employee:
    def __init__(self, name: str, position: str, email: str="default@gmail.com"):
        self.name = name
        self.position = position
        self.email = email

    def __repr__ (self):
        return "{}(name={}, position={}, email={})".format(self.__class__.__name__, self.name, self.position, self.email)

    def __eq__(self, other):
        if other.__class__ is self.__class__:
            return (self.name, self.position, self.email) == (other.name, other.position, other.email)
        else:
            return NotImplemented

If you have spent some time with Python then you know that these three methods (init, repr, and eq) are pretty essential to any class.

  • __init__ is the constructor for the class and defines the way for objects of the class to be made.
  • __repr__ provides a visually appealing representation of the object for programmers using it.
  • __eq__ provides the functionality for you to assess does object1 == object2.

Dataclasses seek to eliminate the manual implementation of methods like the ones seen above and more. Dataclasses essentially assume and implement the functionality of various methods for you, but still provide you with the flexibility to provide your own implementation if needed.

Basic Usage

Let’s see how using dataclasses can slim down our class example from above. First, you must import dataclass from dataclasses,

from dataclasses import dataclass

Then use the @dataclass decorator above our Employee class declaration

@dataclass
class Employee:
    name: str
    position: str
    email: str

Now, go ahead and test some of the class’s functionality…

emp = Employee("dan", "swe", "foo@gmail.com")
print(emp)
# Employee(name='dan', position='swe', email='foo@gmail.com')

emp2 = Employee("dan", "swe", "foo@gmail.com")
print(emp == emp2)
# True

We can see from above that the dataclass decorator has provided our class with some great basic functionality.

Setting Default Instance Variables

Setting defaults with dataclasses comes with a few things to be aware of.

1.) Instance variables with default values must come after those without a default value

So this is invalid:

@dataclass
class Employee:
    name: str = "foo"
    position: str
    email: str

And generates -> TypeError: non-default argument 'position' follows default argument

But this is valid:

@dataclass
class Employee:
    position: str
    email: str
    name: str = "foo"

2.) If you want to further configure the way dataclass handles your default values, use the field method,

Take a look at this example:

from dataclasses import dataclass
from dataclasses import field

@dataclass
class Employee:
    email: str
    position: str = field(default="swe", init=False, repr=False, compare=False)
    name: str = "foo"

Here we can see a few adjustments were made,

  • The position variable has a default value of “swe”
  • The position variable will not be included in the __init__ constructor
  • The position variable will not be included in the __repr__ function output
  • The position variable will not be considered in comparison functions (e.g. Employee1 == Employee2)

If we need to define a default mutable value (e.g. a list or dictionary), use the default_factory parameter:

@dataclass(frozen=True)
class Employee:
    position: str
    email: str
    name: str = "foo"
    teams: list[str] = field(default_factory=list)

e1 = Employee("bar", "foobar@gmail.com")
print(e1.teams)
# []

We can see that by using default_factory, we can easily create default mutable values.

That’s just a few ways to manipulate the behavior of default values with the field method.

If no special configuration is needed, simply define your default values like you normally would. (e.g. the name variable above)

Decorator Configuration

We can take dataclass configuration well beyond that of our instance variables. We can directly manipulate the way the class behaves by providing a few parameters in our Decorator.

See below,

@dataclass(init=False, repr=False, eq=False)
class Employee:
    position: str
    email: str
    name: str = "foo"

We provided a few parameters here to disable the default generation of our __init__, __repr__, and __eq__ special methods. Those parameters can be useful if you want to explicitly define your own versions of those methods.

Another parameter that is particularly useful and good to know is frozen

frozen allows us to create immutable objects.

@dataclass(frozen=True)
class Employee:
    position: str
    email: str
    name: str = "foo"

Any attempts to modify an Employee object after creation will throw an error.

e1 = Employee("bar", "foobar@gmail.com")
e1.name = "bar"
# dataclasses.FrozenInstanceError: cannot assign to field 'name'

This can be useful if you want to ensure that objects of a class are not changed after initialization. The implementation details are nicely taken care of by the dataclasses module.

Drawbacks

There are some minor drawbacks to dataclasses, if used incorrectly.

Some of these drawbacks include:

  • Codebase clutter - Using too many unnecessary dataclasses can cause objects to fly around with a lot of implicit behavior.
  • Minor performance hit - Using dataclasses generates a lot of functionality for you, failing to use this functionality is just wasting program resources and impacting efficiency. (Although likely unnoticeable)

Ultimately, if you use dataclasses as they were intended to be used and know what they entail, you won’t run into any problems using them in your codebase.

When To Use

Use dataclasses when:

  • You need a simple class with minimal configuration to represent data.

Don’t use dataclasses when:

  • You need to create a complex object and plan on having lots of encapsulation and private variables.
  • Your object doesn’t represent data and instead handles mostly behavior (e.g. FileParser, ContextManager)

References

https://docs.python.org/3/library/dataclasses.html