I Let AI Rewrite My Entire Python Project — Here’s What Really Happened
Curious if AI can actually refactor your Python code better than you? I ran a real experiment — and the results were anything but expected.

Would you trust an AI to refactor thousands of lines of your codebase?
I Let AI Rewrite My Entire Python Project — Here’s What Really Happened
It started as a joke.
One late night, frustrated with legacy code and drowning in utils.py
chaos, I thought: What if I just handed this entire project to an AI and let it clean up the mess?
A few prompt-engineered commands later, I was watching GPT-4o reorganize my year-old Python codebase like a robotic Marie Kondo. But what started as an experiment quickly spiraled into an eye-opening (and sometimes painful) lesson in how far AI has come — and where it still falls short.
So, what actually happens when you let AI rewrite your entire Python project?
Here’s what I learned — the good, the bad, and the surprisingly helpful.
The Setup: One Messy Python Project
Before we dive into the results, here’s what I gave the AI to work with:
- Project type: A medium-sized automation tool with ~15 Python files
- Tech stack: Python 3.11,
requests
,pydantic
, some custom decorators and CLI logic - Main issues: Inconsistent code style, duplicated logic, deeply nested if-else statements, and way too many one-letter variable names
I zipped the codebase, fed the files to GPT-4o (in chunks), and gave it a mission:
“Refactor this project for clarity, maintainability, and modern Python best practices.”
Phase 1: The AI Becomes a PEP8 Fanatic
The first thing GPT-4o did? Fix everything that broke PEP8.
- Renamed variables (
x
→user_response
) - Reformatted long lines to 79 characters (like it was 1991)
- Organized imports and killed off unused ones
- Replaced tabs with spaces (thankfully)
This was actually helpful. It cleaned up all the boring, mechanical tasks I usually outsource to tools like black
or ruff
. But it went further than just formatting—it also renamed several key functions to be more descriptive.
AI is amazing at the “linter+plus” layer of cleanup.
Phase 2: Function Explosion
Next, GPT-4o started slicing and dicing my long functions.
“This function does too many things. Let’s break it into five helper functions.”
Sounds great in theory. But here’s what happened:
# Original
def process_user_data(user_id):
data = get_data(user_id)
if not data:
return None
transformed = transform_data(data)
save_to_db(transformed)
# AI-Refactored
def process_user_data(user_id):
data = fetch_user_data(user_id)
if is_data_empty(data):
return None
transformed = transform_user_data(data)
store_transformed_data(transformed)
The logic remained the same, but suddenly I had five new functions — and a new layer of indirection. Navigating the project became… annoying.
Clean? Yes. Maintainable? Questionable. Readability was sacrificed in favor of “one job per function” dogma. AI took SOLID principles very seriously.
Phase 3: Docstring Overload
AI really wants you to know what your code is doing.
Every single function now had a docstring — whether it needed one or not.
def get_status():
"""Returns the current status."""
return self.status
I get it. Documentation is good. But GPT-4o went full intern-mode, explaining the obvious.
Useful for onboarding new devs. But as a solo dev? It added noise.
Phase 4: Type Hints, Everywhere
Every function now looked like a signature from a TypeScript file.
def get_user(name: str, age: int) -> dict:
Even private methods got the full type treatment. And yes, it migrated my dict
returns to TypedDict
and eventually to pydantic.BaseModel
.
I loved this part. Static typing helped surface bugs I didn’t even know were lurking in edge cases.
AI didn’t just add type hints — it leaned into the type-first mindset. Suddenly, I was catching bad inputs before they hit runtime.
Phase 5: The Weird Stuff
Here’s where things got funky:
- Replaced some
for
loops withmap
andlambda
even when it hurt readability - Tried to implement a custom
LoggerFactory
that added unnecessary complexity - Renamed my CLI file from
main.py
toentrypoint.py
... why? - Rewrote perfectly good list comprehensions into verbose
for
loops for clarity
AI had strong opinions — and not all of them were good.
What Surprised Me the Most
Here’s what I didn’t expect going into this:
- AI is better at architectural suggestions than you think.
It recommended breaking one monolithic module into three logical domains:core/
,utils/
, andservices/
. I implemented it—and the project actually became more navigable. - It’s not just a code transformer. It’s a code editor.
GPT-4o doesn’t just apply static rules. It makes value judgments. Some of them were smart. Others were… enthusiastic. - The biggest improvements came from small changes.
Adding enums instead of magic strings. Introducing constants. Making error messages human-readable. AI nailed the polish.
Should You Let AI Rewrite Your Project?
It depends on what you’re looking for.
When it helps:
- You’ve inherited legacy code and need a fresh start
- You want to enforce consistent style and typing
- You’re trying to modernize to Python 3.11+ features
- You want a second pair of (robotic) eyes to spot anti-patterns
When it hurts:
- You have highly custom logic or domain-specific constraints
- You care deeply about naming conventions or personal style
- You’re in a rush — AI rewrites often need human review
- You hate reading docstrings for functions like
get_status()
Final Thoughts: AI Is a Brutally Honest Code Reviewer
Letting AI rewrite my Python project felt like handing it to a brutally honest senior engineer who doesn’t care about your feelings. It exposed flaws I’d been ignoring for months. But it also overstepped, refactoring for the sake of refactoring.
The result?
My project is now more consistent, more testable, and (mostly) cleaner.
But it also feels less mine.
AI won’t replace developers.
But it will challenge the way we write, review, and think about code.
And that alone makes the experiment worth it.
If You’re Curious, Try This Yourself
Pick a file from your project. Drop it into ChatGPT with this prompt:
“Refactor this Python code for readability and maintainability. Use modern best practices and explain your changes.”
Then review what it suggests.
You might just learn something new — even from your own code.