How I Cleaned Up Messy Legacy Python Code Using dataclass and typing

dataclass and typing didn’t just clean the code — they helped me understand it again.

How I Cleaned Up Messy Legacy Python Code Using dataclass and typing
Photo by Jason Leung on Unsplash

The code worked — but it was fragile, hard to read, and painful to extend. Modern Python tools changed everything.

How I Cleaned Up Messy Legacy Python Code Using dataclass and typing

If you’ve ever inherited a legacy Python codebase, you know the feeling.

It’s like walking into a room where someone started five puzzles, mixed the pieces together, and left you a sticky note that says: “Good luck.”

That’s what happened to me recently. I was handed a large legacy Python project — no documentation, no structure, and certainly no type hints. It was a jungle of nested dictionaries, magic strings, and functions that returned who-knows-what.

But here’s the twist: I didn’t rewrite everything. Instead, I systematically refactored critical parts using Python’s dataclass and typing modules — and the difference was night and day.

Let me walk you through how I turned chaos into clarity.


The Legacy Nightmare: Dictionaries Everywhere

The project I inherited was essentially a data processing pipeline. Each stage passed huge nested dictionaries to the next. Here’s the kind of code I was dealing with:

def process_user(data): 
    user_id = data['id'] 
    name = data.get('name', '') 
    email = data.get('email', '') 
    # Do some more processing...

At first glance, it’s simple. But as the project grew, tracking which keys existed where became a nightmare. Some functions expected 'id', others expected 'user_id'. And of course, none of it was validated.

Step 1: Introducing @dataclass

The first thing I did was define explicit data models using dataclass. It’s a feature from Python 3.7+ that lets you create classes with less boilerplate.

Here’s what the user data looked like after refactoring:

from dataclasses import dataclass 
 
@dataclass 
class User: 
    id: int 
    name: str 
    email: str

Now instead of blindly passing dictionaries, I worked with real objects:

def process_user(user: User): 
    print(f"Processing user {user.id} with email {user.email}")

This small change did three things:

  • Made the code more self-documenting
  • Reduced key errors and typos
  • Gave my IDE something to autocomplete and validate

Step 2: Adding Static Typing

Next up: typing.

Python’s dynamic nature is great — until it isn’t. When you’re dealing with lots of legacy functions and passing ambiguous data around, static typing is a lifesaver.

With Python’s typing module, I added types to every function signature:

from typing import List 
 
def send_emails(users: List[User]) -> None: 
    for user in users: 
        # IDE now knows user is a User instance 
        print(f"Sending email to {user.email}")

I even used Optional, Union, and Dict where necessary. Here’s a slightly more complex example:

from typing import Optional 
 
@dataclass 
class Address: 
    street: str 
    city: str 
    zipcode: Optional[str] = None

This meant if someone accidentally passed None to a required field, I’d catch it at development time — not in production.

Step 3: Replacing Magic Dictionaries with Models

Many parts of the code looked like this:

def save_order(order_data): 
    db.insert('orders', { 
        'id': order_data['order_id'], 
        'user': order_data['user_id'], 
        'amount': order_data['amount'], 
    })

No type safety. No validation. Just brittle string-based chaos.

I replaced these with structured models:

@dataclass 
class Order: 
    order_id: int 
    user_id: int 
    amount: float 
 
def save_order(order: Order): 
    db.insert('orders', { 
        'id': order.order_id, 
        'user': order.user_id, 
        'amount': order.amount, 
    })

Now I could serialize and deserialize with confidence using tools like dataclasses.asdict() or even pydantic if needed.

The Hidden Benefits

Here’s what I didn’t expect:

  • New devs onboarded faster because models acted like documentation
  • Debugging became easier, since tracebacks referenced real attributes
  • Unit tests got cleaner, thanks to type-aware mocks and fixtures
  • Errors were caught early, often by my IDE or pre-commit hook running mypy

Refactoring wasn’t about rewriting everything. It was about building trust in the codebase — step by step.

When You Should (and Shouldn’t) Do This

Use dataclass and typing when:

  • Your data has a clear structure
  • You’re dealing with nested dictionaries
  • You want maintainable, self-documenting code
  • You care about catching bugs early

Avoid it when:

  • You’re prototyping something throwaway
  • Your data is unstructured and varies wildly
  • You don’t want to commit to strict typing just yet

Final Thoughts

Refactoring legacy code is like untangling headphones — frustrating at first, but deeply satisfying when you finally get it right.

Using dataclass and typing didn’t magically fix everything, but it gave me a way to systematically clean up and rebuild trust in the code. My team now works faster, catches bugs earlier, and feels more confident navigating the codebase.

If you’re staring down a similar mess, start small. Define one data model. Add one type hint. Run mypy. Then do it again.

Clean code isn’t a destination. It’s a habit.


Liked this article?
Follow me for more real-world Python tips, refactoring stories, and clean code practices.

Let’s make legacy code a little less scary.

Photo by Ryoji Iwata on Unsplash