How I Cleaned Up Messy Legacy Python Code Using dataclass and typing
dataclass and typing didn’t just clean the code — they helped me understand it again.

The code worked — but it was fragile, hard to read, and painful to extend. Modern Python tools changed everything.
How I Cleaned Up Messy Legacy Python Code Using dataclass
and typing
If you’ve ever inherited a legacy Python codebase, you know the feeling.
It’s like walking into a room where someone started five puzzles, mixed the pieces together, and left you a sticky note that says: “Good luck.”
That’s what happened to me recently. I was handed a large legacy Python project — no documentation, no structure, and certainly no type hints. It was a jungle of nested dictionaries, magic strings, and functions that returned who-knows-what.
But here’s the twist: I didn’t rewrite everything. Instead, I systematically refactored critical parts using Python’s dataclass
and typing
modules — and the difference was night and day.
Let me walk you through how I turned chaos into clarity.
The Legacy Nightmare: Dictionaries Everywhere
The project I inherited was essentially a data processing pipeline. Each stage passed huge nested dictionaries to the next. Here’s the kind of code I was dealing with:
def process_user(data):
user_id = data['id']
name = data.get('name', '')
email = data.get('email', '')
# Do some more processing...
At first glance, it’s simple. But as the project grew, tracking which keys existed where became a nightmare. Some functions expected 'id'
, others expected 'user_id'
. And of course, none of it was validated.
Step 1: Introducing @dataclass
The first thing I did was define explicit data models using dataclass
. It’s a feature from Python 3.7+ that lets you create classes with less boilerplate.
Here’s what the user data looked like after refactoring:
from dataclasses import dataclass
@dataclass
class User:
id: int
name: str
email: str
Now instead of blindly passing dictionaries, I worked with real objects:
def process_user(user: User):
print(f"Processing user {user.id} with email {user.email}")
This small change did three things:
- Made the code more self-documenting
- Reduced key errors and typos
- Gave my IDE something to autocomplete and validate
Step 2: Adding Static Typing
Next up: typing
.
Python’s dynamic nature is great — until it isn’t. When you’re dealing with lots of legacy functions and passing ambiguous data around, static typing is a lifesaver.
With Python’s typing
module, I added types to every function signature:
from typing import List
def send_emails(users: List[User]) -> None:
for user in users:
# IDE now knows user is a User instance
print(f"Sending email to {user.email}")
I even used Optional
, Union
, and Dict
where necessary. Here’s a slightly more complex example:
from typing import Optional
@dataclass
class Address:
street: str
city: str
zipcode: Optional[str] = None
This meant if someone accidentally passed None
to a required field, I’d catch it at development time — not in production.
Step 3: Replacing Magic Dictionaries with Models
Many parts of the code looked like this:
def save_order(order_data):
db.insert('orders', {
'id': order_data['order_id'],
'user': order_data['user_id'],
'amount': order_data['amount'],
})
No type safety. No validation. Just brittle string-based chaos.
I replaced these with structured models:
@dataclass
class Order:
order_id: int
user_id: int
amount: float
def save_order(order: Order):
db.insert('orders', {
'id': order.order_id,
'user': order.user_id,
'amount': order.amount,
})
Now I could serialize and deserialize with confidence using tools like dataclasses.asdict()
or even pydantic
if needed.
The Hidden Benefits
Here’s what I didn’t expect:
- New devs onboarded faster because models acted like documentation
- Debugging became easier, since tracebacks referenced real attributes
- Unit tests got cleaner, thanks to type-aware mocks and fixtures
- Errors were caught early, often by my IDE or pre-commit hook running
mypy
Refactoring wasn’t about rewriting everything. It was about building trust in the codebase — step by step.
When You Should (and Shouldn’t) Do This
Use dataclass
and typing
when:
- Your data has a clear structure
- You’re dealing with nested dictionaries
- You want maintainable, self-documenting code
- You care about catching bugs early
Avoid it when:
- You’re prototyping something throwaway
- Your data is unstructured and varies wildly
- You don’t want to commit to strict typing just yet
Final Thoughts
Refactoring legacy code is like untangling headphones — frustrating at first, but deeply satisfying when you finally get it right.
Using dataclass
and typing
didn’t magically fix everything, but it gave me a way to systematically clean up and rebuild trust in the code. My team now works faster, catches bugs earlier, and feels more confident navigating the codebase.
If you’re staring down a similar mess, start small. Define one data model. Add one type hint. Run mypy
. Then do it again.
Clean code isn’t a destination. It’s a habit.
Liked this article?
Follow me for more real-world Python tips, refactoring stories, and clean code practices.
Let’s make legacy code a little less scary.
