HACKER Q&A
📣 throwaway_am

How to deal with a messy codebase?


I recently joined a startup as a data scientist developer. I'm taking over a project that began a few months ago, but the main developer has left. The project is exciting, the team is great, and I really like the company.

The challenge is the messy codebase I have to work with. It's a mix of a somewhat organized library, random scripts, lots of Jupyter notebooks, and more. Nothing is automated; you have to run a notebook manually to generate training data!

Many developers -following the typical stereotype about devs- might suggest starting over from scratch, but I know that's not a good idea. It's not good technically since there's a lot of implicit knowledge hidden in the code, and it's not respectful to the previous work the team has done. However, expanding on the existing work is tough because of the disorganized code.

I'm unsure how to handle this. I don't want to insist on a complete redo, but I also can't be productive with the current state of the project. I thought about making small improvements and tackling technical issues, but it's hard to explain the value of this work, especially in my first few months.

What do you think? How would you approach this problem?


  👤 nh23423fefe Accepted Answer ✓
i can't tell what your issue is?

is it the manual workflow? write scripts or use UI automation to get the entire workflow running without you having to touch the machine.

code is messy and spread out? replace every function/method/routine with a stub and move all the code to a central library and call it there. refactor/consolidate library and refactor/delete stubs as you go along.

the main point is to never break the code and assume you can big bang fix it. you want make lots of little changes that dont alter the behavior of the code but do improve developer experience.

the goal is to get to a spot where you could just 'make' from the root dir if you wanted.