Assess the Damage

We’ve all been there, dealing with “that codebase” that is a ball of mud and lacks architecture. Whether it was some prototype we created, a minimal viable product we inherited from another team, or a project that started off with good intentions but was rushed by the business and deployed to production as a minimal viable product. Whatever the case, we have a mess on our hands that needs to be cleaned up and made maintainable.

The first step is to step back and assess the damage. It might help to sketch out a rough diagram of the current architecture to get a reference of what connects to what. Document the consumable interface, whether it’s library functions or API endpoints, write out explicit documentation of how it works today. Then create integration testing that proves those endpoints work.

Draw the Boundaries

Now the redesign can begin. Look at the reference diagram, sort the code into categories:

business logic
display and presentation code, such as views and controllers
data handling, such as models and repositories
adapters and external service integrations
supplementary code (these might be good candidates to split out into libraries if they are generic and non trivial)

Within the categories, sort code into the respective domains. How the code is split out depends on the chosen architecture but categorizing can help with reorganizing later. Organize the codebase into cohesive modules. These modules should have one well documented public API that other modules can call, otherwise the module should be treated as a black box.

Knocking Down the Walls

Refactoring is a lot like remodeling in ways. We have been repainting rooms of our house. We didn’t start by stripping every wall, instead each room was closed off, stripped, painted, then refurnished. In the same vein, code refactoring should not tear all the walls down. Even if the codebase can be taken down as a whole during the process, it helps keep work organized to focus on one part at a time. Pick a domain or functional piece to work on at a time.

The most straightforward way of refactoring is to create a new module or class and start moving code over. But since lack of tests is what got us in this mess, we do not want to repeat the same mistakes. Create unit tests for the module for its public interface, then copy and refactor code into the module to fit that interface, then start unplugging calls from the codebase to the old module and plug them into the new module and check that integration tests pass. Rinse and repeat. Martin Fowler defined this process and named it the strangler pattern. You strangle off small parts of old code.

At some point you will find that you have to pull changes back out or reach back into modules you thought were done, this is exactly why it is important to have both a project wide integration test and per module tests. My advice would be to continuously write out tasks that need to be done and run with one at a time, creating a review for each task. These changes should be atomic and result in a codebase that is still functional. Continuous integration jobs can help enforce this, and ensure that there are no local state issues causing false positives (such as a dirty local database state).

Refactoring can seem daunting, especially for an entire project, but with careful planning any codebase can be refactored. Even large codebases can knock down small walls one at a time.