Another thought... if I'm tasked with maintaining or adding features to existing code, should I feel responsible for writing tests for the existing codebase?
What you touch will be on you. So writing a test for your own things, and testing what you are using of the legacy code as you go is a good way to maintain your own coverage in case someone else breaks something in the legacy code, or maybe an update to a library or something goes sideways.
Since you'll be doing a good job of mapping it out below is something elsee. The nice thing is LLMs can help a lot with writing tests.
Reliability can be achieved in a number of ways.
Where I've come across an unknown spreadsheet gone wild, or a code base who's jedi knights have long since vacated, I try to get a sense of putting together the story of how the project started and how it got to where it is today.
If I was cold, I'd first focus on the most critical parts, the most complex, and most unknown potentially to put testing around first.
If refactoring was a goal, for me that means it has to run the same as the old code. One way to do that is see if the existing code can become a service that can be compared to the new service.
LLMs have been really fascinating for me to apply in this regards because I've had to map and tear apart so many processes, systems, integrations and varying interpretations and memories.
AI can even explain what most code is doing no matter how convoluted it is. From there, things like test driven development, or test driven legacy code stabilization is much more doable and beneficial not just for the legacy code, but keeping the AI behaving if it's helping with coding.
Do you trust yourself enough to not break existing code while maintaining or adding features to it? What would be the consequence of you screwing up? Where would your screw up get noticed? Is it a scary person or group of people who will chase you down to fix it?
I've worked long enough to not trust myself at all anymore, regardless if it's old code or new. Then I evaluate according to above. Usually I land firmly in "yeah, better test it"-territory.
From my real life experience with legacy code:
- There is usually a "test" folder, but what they test look nothing like the actual code, they often don't even compile, and if they do, they fail, and if they don't, that's because every failing test is disabled.
- Refactoring? What refactoring? Usually, all you see is an ugly hack, usually noticeable because the style is different from the rest of the code base. The old code is generally commented out if it is in the way, left alone as dead code if it isn't.
- Writing tests yourself? It would require you to put the test framework back in order first, they don't have the budget for that. Something as simple as reformatting the function you are working with is already a luxury.
- Sometimes, some refactoring is done, but that's only to add new bugs as to keep things exciting.
Still, as surprising as it may seem, I actually like working with legacy code. It is a challenge, understanding what it is doing (without reading the documentation of course, as if it exists, it is full of lies) and trying to leave it in a better state than it was before in a reasonable amount of time. It is great at teaching you all the antipatterns too, very useful if you end up in a greenfield project later on and you don't want it to end up like all the legacy code you have worked on before. If people actually use it, it will, but you can at least delay the inevitable.
Write tests for new additions to the code, when making major overhauls to sections of it, or when a bug is discovered that the old tests did not catch.
Those are the 3 cases I would say you should add a test. #3 is most important, in my experience.
I recently talk therapied some legacy code through Claude Code back to life, but along the way put some frameworks in place for the codebase itself, and overall with how I think about approaching existing implementations, which helped a ton to shed a path forward for modernization.
It ultimately reminded me there's always a ton more to learn and think about, and it's sometimes quickest to learn from others experiences when considering paths forward, so the book is a helpful share, thanks.
All code is eventually legacy code in one way or another, whether it's the code directly, or the infrastructure level. Today's code might be the best one writes today.. looking back in 5-10 years.. one might wonder if best practice was best practice.
I remember when Martin Fowler first published the book "Refactoring" - I was so relieved at the time because somebody with some clout that executives might actually listen to had not only identified what was wrong with software but identified a way to fix it and even gave it a name! Boy was I wrong - now you have to be careful how and when you use the term "refactor" because the people who ought to be supportive hear "wasting time".
Programmers had a hand in this, sadly: https://martinfowler.com/bliki/RefactoringMalapropism.html
Nowadays people throw the term "refactoring" around quite loosely, usually meaning "rewrite" when that was not at all the original meaning of the term!
A little bit like how "vibe coding" quickly went from one definition to another even though people should have known better.
The proliferation of technically clueless managers of tech projects and their "control systems" such as scrum has exacerbated the problem.
I met some doctors recently that made some rhyming complaints when non-medical mba management took over a practice.
>But the code examples are in Java and C++ and I do python/JavaScript/ruby/
The problem with real legacy code is that sometimes it's not even in those languages. It's VB.NET, COBOL, AS400, BASIC, FORTRAN...and these may not have a chance to "wrap around your class", or "think about all the ORM code that you use". None! I use none of that!. And I can't even call any tests because I can't extend a non existant class, there's no objects in here!.
The author also says:
>You need feedback. Automated feedback is the best. Thus, this is the first thing you need to do: write the tests.
I don't need automated feedback. I need to untangle this deep business layer that everything is wrapped around in a 40 years old codebase, with practically no documentation and having to modify one of the core systems of the company. Sometimes I can't even trust the feedback the program outputs. In this kind of scenarios where the code is so messy and limited by technical factors, the best approach I have found is to debug. And that's it. Follow the trail, find the "seam", and then start your kingdom in the little space you can. Because if you tell your boss that you are implementing Unit tests in a 40 years old codebase, the first question he is gonna hit you with is "Why?", and the article doesn't give any compelling argument to answer this.
*laughs in VBA*
(If you've never heard of JScript, be thankful. It was microsoft's very-slightly-modified ECMAscript variant)
Is "[w]hen code is not tested, how do you know you didn’t break anything?" not a compelling argument to your boss?
If the choice is thought to be between:
* delivering value directly to the customer to justify a company's existence
* or adding tests to things that already work (shore up) in an effort to make more correct changes in the future
Will anyone be surprised how often it's the former that management will go for?
I've found the appetite for this type of testability/observability improvement work increases proportionally with the number of support calls being made from customers complaining about the current feature set being unstable and buggy. This work is less palatable when customers really are just looking for that next new feature you promised instead and everything else is a-ok. The exception being things like orbital navigation systems etc..
I don't think I have ever asked permission to do make what I thought was the correct change.
One customer required lengthy qualification processes when changing the software. But “configuration changes” had a lower bar and somehow these “macros” counted as config. So eventually all the interesting business logic and RPC processing end up in a giant macro file.
https://esolangs.org/wiki/BANCStar
https://github.com/jloughry/BANCStar
Relatedly, a long time ago I discovered that Zenographics Mirage—a vector graphics/illustration program originally designed to run on like, VAXen, with both a text terminal and a graphics terminal with a digitizing tablet (similar to this setup seen on Reading Rainbow: https://www.youtube.com/watch?v=b_zYaIxb6dY&t=965s), but later ported to PCs—ran on a sort of bytecode. There were a number of sample scripts shipped with the package that allowed you to automate things in Mirage's command language. Some of them had statements like DO 62,2,32472,32476,0 and comments that read "Don't worry about this, this is just Mirage assembly language." Intrigued, I discovered in the manual a feature you could enable called Op Code Monitor that flashed similar numbers whenever you entered a command. It was mentioned but not documented in detail, nor what the numbers meant, but from that and the scripts I could make some pretty good guesses. I figured out how to make Mirage prompt for a point and store it in a register; and with that I made a command to draw a rectangle that could be rotated. A rectangle in Mirage was defined by its corner points, so when you attempted to rotate it it just rotated the corner points and drew an axis-aligned rectangle with the new corner points. My command accepted two corner points and drew a polyline, so that when you rotated it, the whole rectangle rotated.
I agree with the parent comment that it is useful to follow the "trail" through the code. It can be a big effort just to figure out how the pieces are connected. Figuring out the data structures and files is another important thing. Also, write documentation as you go; this will help others understand the big picture. If you can just jump in and start writing meaningful unit tests, your legacy system is kinda trivial :-)
Overall, there are people who view testing as a useful tool and people who view testing as an ideology. This book falls into the latter category.
It is strange, actually, how much value we place on any information that sits between two pieces of cardboard.
I'm gonna preach for the church of "away with unit tests": so what? You don't want to test code. Code is not what the user care about.
You want to test behavior. And for this you don't need to extend classes. You should not have to rewrite any code from your application. Like you say: write tests at the seam. Automate user inputs and checking outputs.
A good test suite should not need to be rewritten during refactoring. It should even allow you to change language or replace for an off-the-shelf solution you did not write. If a unit test suite does not allow that, unit tests are an impediment. And I don't care about the "test pyramid", like the IRL ones, it is just a relic; from an age when launching all your tests in parallel was unfathomable.
100% agreed.
> I'm gonna preach for the church of "away with unit tests"
I disagree with the semantics here.
From the article: "In short, your test is not unit if it doesn’t run fast or it talks to the Infrastructure (e.g. a database, the network"
This says _nothing_ about what part of the app structure you test. It's not always a class-method test. Outside-in, behaviour centric tests that are not close-coupled to structure, can also be unit tests. And most of the time, it's a better kind of unit test.
Kent Beck said many times: "tests should be coupled to the behavior of code and decoupled from the structure of code."
"test behavior" was the original intent of unit tests. The idea that unit tests are only close-coupled class-method tests, or that testing 2 collaborating classes from the same app at once counts as an "integration test" is a latter-day misconception. A dumbing down.
- Anything with millennial-looking cartoon characters in cartoons is bad.
- Anything with furry characters is good.
I'm not aware of the formal Design Pattern name, but a google search yielded this blog post on the subject: https://medium.com/@sahayneeta72/parallel-run-strategy-7ff64...
Coincidentally this is the same strategy github employed when verifying libgit2: https://github.blog/engineering/engineering-principles/move-...
This can be a superpower.
I once had to replace a PostgreSQL-interfacing layer of a large+complex+important system, and most of the original tests had been lost.
It's the kind of change that could be an existential risk for a company that needs the system to be available and correct.
So I built a comprehensive test suite for the existing PostgreSQL-interfacing layer, tested it, made drop-in compatible API using the new way to talk with PostgreSQL, tested that, and... the entire huge system simply worked with the replacement, without a lot of headache/defects/dataloss/ulcers.
> Before you change code, you should have tests in place. But to put tests in place, you have to change code.
Keep reading. You don't always have to do it in the heroic way the article described in this section. Modularity and backward-compatible interfaces, FTW, when you can. The "Sprout" and "Wrap" techniques that the article describes later.
Also, one complementary general technique that the article didn't get into is that sometimes you can run two implementations in parallel, and check them against each other. There are various ways you can do this, involving which really implementation really takes effect, and what you do if they don't agree. (Related: voting configurations in critical systems.)
> Characterization tests
The test suite not only let me validate the implementation, but the exercise of developing the test suite also helped me understand exactly what I should be implementing in the first place.
Sometimes I had to look at the implementation, to tentatively figure out the semantics, then testing validated that also.
> Use scratch refactoring to get familiar with the code
In reasonable cultures and with reasonable management, or if you don't tell anyone. Otherwise, it's the same kind of risk as the old "throwaway prototype" problem: someone with authority says "looks good; I told the customer/CEO we just need to make these few more changes, and ship it". Or, in popular Agile/Scrum processes, "I thought you already completed the refactor last sprint; why are you doing it again".
However, the big problem with legacy code is always political. And the only book I've seen ever try to address that is "Kill It with Fire".
It's one of the few books that talks about the need to keep your team motivated and plugged into the political system given that they are working on the thing that's super important but also almost invisible. You need to be constantly selling what is going on in front of the important eyeballs.
Unfortunately, a lot of the anecdotes have an undercurrent of "Until the system has actively exploded into a million pieces such that somebody with real power has their ass on the line, don't bother."
People often think they are doing it 'properly' now by starting simple, but as they learn more and add functionality, they end up with the same complex mess they wanted to avoid.
And to echo what others are saying I have spent the last month experiencing for the first time how not simple throwing a meaningful test suite together is for a ginormous legacy codebase. I was glad the author briefly seemed to acknowledge that but still... the pain...
They said the playwright ui was supposed to make it easy... Just plug it into an LLM they said... It should just take a few days right?
> You have to deal with Legacy Code every day.
Legacy code is code you didn't write. It used to be that code you didn't write was written by past team members. These days, we have legacy-code-generators that write code you didn't write at a much higher rate then your company can fire team members.
I've made huge successful refactors using nothing but carefulness and manual testing, maybe just a very little bit of unit testing :)
Of course the goal is to reach an end state with automated tests, but insisting on 100% automated tests up-front can actually prevent the refactoring from even starting.
I like how https://gwern.net and https://en.wikipedia.org allow you to hover over terms and show you a quick glance of a term you might not know, and then you can fan out to create an understanding for the basis needed to understand the material in front of you. I feel like some sort of windowing system where I can pin and move references around and then kinda have the llm explain or reference things might be of use to me, as in help me navigate the unknown. Is there a "study companion" application of the sort?
I find that this is easier to do with a language like Java than say Python, I need as much information as I can in order to understand the decisions that led to the current implementation.
Sometimes when I want to change some piece of legacy code, I need to know the rationale behind the code written here, although keeping the interfaces the same and changing the implementation allows me to go quite far.
allemagne•18h ago
However, most of the content in the last half of the book consists of naming and describing what seemed like obvious strategies for refactoring and rewriting code. I would squint at the introduction to a new term, try to parse its definition, look at the code example, and when it clicked I would think "well that just seems like what you would naturally choose to do in that situation, no?" Then the rest of the chapter describing this pattern became redundant.
It didn't occur to me that trying to put the terms themselves to memory would be particularly useful, and so it became a slog to get through all of the content that I'm not sure was worth it. Curious if that was the experience of anyone else.
ebiester•18h ago
It's harder if they can't read the older examples, but I can google for a more modern example as well. It gives nomenclature and examples.
dotancohen•3h ago
michaelfeathers•16h ago
iteria•14h ago