Code archaeology

30 September 2020

I often describe the work of software development as being archaeological. To people who aren’t software developers, this might sound strange.

Software can live for a long time, often with developers making changes to it continuously during that time. For example, the last system I worked on must have been at least 15 years old, possibly closer to 20. Most of the original developers had moved on to other companies before I started working there. In an environment like that, there will be a lot of complexity that has been built up over the years.

While you could, in theory, just look at how the code works now and figure out the changes that you need to make in order to get the results that you want, it turns out that it’s often easier to look at the history of changes and use that to learn about the system. It often happens that you look at a piece of code that does something odd, look through the history to figure out why it was done like that, and find that the developer was working around some weirdness in some other part of the system that would have tripped you up if you didn’t know about it. More generally, understanding how previous developers understood the system can give you clues about changes that you can make now. There are a lot of Chesterton’s Fences in software.

That investigation can feel like archaeology. If you are lucky, the developer will have given a clear explanation of why they made each change, rewritten the surrounding code so that everything is as simple as possible, and updated the documentation. But you’re usually not lucky. So you end up looking at commit messages that say “changes” or “updates” or “make it work this time”, and trying to piece together what was going on in their heads while they did that work. And there are layers upon layers to dig through.