from dataclasses import dataclass, field, asdict, astuple
@dataclass(frozen=True, order=True)
class Comment:
id: int
str = ""
text: list[int] = field(default_factory=list, repr=False, compare=False) replies:
24 DataClass
24.1 Intro
import attr
@attr.s(frozen=True, order=True, slots=True)
class AttrComment:
id: int = 0
str = "" text:
= Comment(1, "I just subscribed!")
comment_1 = Comment(2, "Hi there")
comment_2 # comment.id = 3 # can't immutable
print(comment_1)
#> Comment(id=1, text='I just subscribed!')
To Dict or Tuple
asdict(comment_1)#> {'id': 1, 'text': 'I just subscribed!', 'replies': []}
astuple(comment_1)#> (1, 'I just subscribed!', [])
= dataclasses.replace(comment, id=3)
copy #> NameError: name 'dataclasses' is not defined
print(copy)
#> NameError: name 'copy' is not defined
pprint(inspect.getmembers(Comment, inspect.isfunction))#> NameError: name 'pprint' is not defined
24.1.1 Interitance
from dataclasses import dataclass
# Define the base dataclass
@dataclass
class Person:
str
name: int
age:
# Define a derived dataclass that inherits from Person
@dataclass
class Employee(Person):
int
employee_id: str
department:
# Create an instance of the derived class
= Employee(name="John Doe", age=30, employee_id=1234, department="Engineering")
employee
# Display the inherited and new fields
print(employee)
#> Employee(name='John Doe', age=30, employee_id=1234, department='Engineering')
24.1.2 Extract each compoent to List
# List of Comment instances
= [comment_1, comment_2] comments
24.1.2.1 List Comprehension
# Extract the 'id' and 'text' properties into separate lists
= [comment.id for comment in comments]
ids
ids#> [1, 2]
= [comment.text for comment in comments]
texts
texts#> ['I just subscribed!', 'Hi there']
24.1.2.2 Zip with Unpacking
id, comment.text) for comment in comments]
[(comment.#> [(1, 'I just subscribed!'), (2, 'Hi there')]
# Using zip with unpacking
= zip(*[(comment.id, comment.text) for comment in comments]) ids, texts
# Convert to list if needed (since zip returns tuples)
= list(ids)
ids = list(texts) texts
24.1.2.3 👋 To Data Frame
import pandas as pd
from dataclasses import asdict
comments#> [Comment(id=1, text='I just subscribed!'), Comment(id=2, text='Hi there')]
# Convert to a DataFrame
= pd.DataFrame([asdict(comment) for comment in comments])
df
df#> id text replies
#> 0 1 I just subscribed! []
#> 1 2 Hi there []
24.2 Default Argument
24.2.1 Default Factory
The default_factory
argument in the field()
function within Python’s dataclasses
module is used to provide a default value for a field that is a mutable type, such as a list, dictionary, or set. This is particularly useful because using mutable default arguments (like a list or dictionary) directly in a function or class definition can lead to unintended behavior.
24.2.1.1 Why Use default_factory
?
If you define a mutable default argument directly, it can lead to all instances of the class sharing the same object. This is often not what you want. For example:
from dataclasses import dataclass
@dataclass
class MyClass:
list = []
my_list: #> ValueError: mutable default <class 'list'> for field my_list is not allowed: use default_factory
# All instances share the same list
= MyClass()
obj1 #> NameError: name 'MyClass' is not defined
= MyClass()
obj2 #> NameError: name 'MyClass' is not defined
1)
obj1.my_list.append(#> NameError: name 'obj1' is not defined
print(obj2.my_list) # Output: [1], obj1 and obj2 share the same list!
#> NameError: name 'obj2' is not defined
In contrast, default_factory
ensures that each instance of the class gets its own independent copy of the mutable object:
24.2.1.2 Example Using default_factory
:
from dataclasses import dataclass, field
@dataclass
class MyClass:
list = field(default_factory=list)
my_list:
# Now, each instance gets its own list
= MyClass()
obj1 = MyClass()
obj2 1)
obj1.my_list.append(print(obj2.my_list) # Output: [], obj1 and obj2 have independent lists
#> []
How It Works:
default_factory=list
: This tells Python to calllist()
to create a new empty list each time a new instance ofMyClass
is created.default_factory=dict
: Similarly, this would create a new dictionary for each instance.default_factory=lambda: {"key": "value"}
: You can also use a lambda function to generate a default value if you need something more complex.
When to Use It:
You should use default_factory
whenever you need a default value for a field in a data class that is a mutable type. This ensures that each instance of the class gets its own independent copy of the mutable object, avoiding the unintended sharing of state between instances.
Summary:
default_factory
provides a way to specify a factory function that returns a default value for a field.- It is especially useful for fields that need a mutable default value (like lists or dictionaries) to ensure each instance of the class gets its own unique object.
- This helps prevent bugs that occur due to shared mutable defaults across instances of a class.
24.3 Example
24.3.1 Ex2: Create Simple (Recist)
You can convert the Recist
class to a dataclass
in Python by using the dataclasses
module, which simplifies the creation of classes by automatically generating special methods like __init__
, __repr__
, and __eq__
. Here’s how you can do it:
from dataclasses import dataclass, field
@dataclass
class Recist:
str
category: str = field(init=False)
category_full:
= {
_category_dict "PR": "Partial Response (PR)",
"CR": "Complete Response (CR)",
"PD": "Progressive Disease (PD)",
"SD": "Stable Disease (SD)"
}
def __post_init__(self):
# Set the full category name based on the provided short category
if self.category in self._category_dict:
self.category_full = self._category_dict[self.category]
else:
raise ValueError(f"Unknown category: {self.category}")
Explanation:
@dataclass
Decorator: This decorator is used to create a data class, which automatically generates the__init__
method and other utility methods based on the fields you define.- Fields:
category
: This is the input field where you pass the short form of the RECIST category.category_full
: This field is automatically computed based on thecategory
and does not need to be initialized by the user. It is marked withfield(init=False)
to exclude it from the generated__init__
method.
__post_init__
Method: This special method is automatically called after the__init__
method. It’s used here to setcategory_full
based on the providedcategory
value. If thecategory
is not in the_category_dict
, an exception is raised.
# Creating an instance of the Recist class
= Recist(category="PR")
recist_instance print(recist_instance.category) # Output: "PR"
#> PR
print(recist_instance.category_full) # Output: "Partial Response (PR)"
#> Partial Response (PR)
# Handling an invalid category
try:
= Recist(category="XX")
invalid_instance except ValueError as e:
print(e) # Output: "Unknown category: XX"
#> Unknown category: XX