I advocated for just having a script for each, even if they were 80% alike to handle the variations... Another developer created a massive set of database tables and coded abstractions for flexible configuration driven imports.
My solution was done in a couple days... The other dev spent months in their solution that didn't work for half the imports and nobody could follow their solution. When they left the company the next year, the imports that were under the complex solution were switched to scripts and the beast was abandoned entirely.
“Prefer duplication over the wrong abstraction”
https://www.poodr.com/ https://www.youtube.com/watch?v=PJjHfa5yxlU
You develop a sense for when the time is right over the years, by maintaining over engineered pieces of shit, many written by yourself.
To beginners it seems like coming up with the idea and building it is the difficult part; it isn't, not even close. The only difficult parts worth mentioning is keeping complexity on a tight leash and maintaining conceptual integrity.
I.e. it worked because it smashed the broad statement and forced a discussion about particulars. Now who was right about those, I have no idea, since I wasn't even present.
Trying to explain why a little duplication is preferable to bad abstractions, and specifically preferable to tightly coupling two unrelated systems together because they happened to sort-of interact with the same resource, was endless and tiring and - ultimately - often futile.
Unfortunately, terraform module system is extremely lacking and in many ways you're totally right - if your module is just replicating all the provider arguments it just feels wrong.
> writing Terraform configuration, became absolute tyranny.
I totally agree with deduplication, but only when it's shown. Otherwise it's too easy, and I've seen people try to use this argument to justify slop many times.
If you keep having to make edits in two independent systems every time you want to make one change, something is wrong. If you’re leaving footguns around because changing one thing affects two or more systems, but you aren’t at liberty to change them both in production, that’s also something wrong.
As it's only a draft piece at the moment I'll lay out some of the talking points:
- All software design and structure decisions have trade-offs (no value without some kind of cost, we're really shifting what or where the cost is to a place we find acceptable)
- 'Dont Repeat Yourself' as a principle taught as good engineering practice and why you should think about repeating yourself; don't take social proof or appeal to authority-type arguments without solid experience
- There is a difference between things that are actually the same (or should be for consistency (such as domain facts, knowledge) versus ones that happen to be the same at the time of creation but are only that way by coincidence
- Effective change almost always (if not always always) comes from actual, specific use-cases; a reusable component not derived from these cases cannot show these
- Re-usable components themselves are not necessarily deployed or actually used, so by definition can't drive their own growth
- If they are deployed, it's N+1 things to maintain, and if you can't maintain N how are you going to maintain N+1?
- The costs of creation and ongoing maintenance - quite simply there's a cost to doing it and doing it well, and if it costs more to develop than the value gained then it's a net loss
- Components/modules that are used in the same places their use cases are get naturally tested and have specific use-cases; taking them out removes the opportunity for organic use cases
- What happens when we re-use components to allow easy upgrades but then pin those for stability? You still have to update N places. The best case scenario might be you have to update N places but the work to do that is minimised for each element of N
- Creation of an abstraction without enough variety of uses in terms of location and variety of use (a single use-case is essentially a layer that adds no value)
- Inherent contradictions in software design principles - you're taught to 'avoid coupling', but any shared component is by definition coupled. The value of duplication is that it support independent growth or change
- The cost of service templates and/or builders (simple templated text or entire builder-type tools that need to be maintained and used just to boostrap something) - these almost never work for you after creation to support updates
- The cost of fast up-front creation (if you're doing this a lot, maybe you have a different problem) over supporting long-term maintenance
- The value of friction - some friction that makes you question whether a 'New thing' is even needed is arguably good as a screening/design decision analysis step; having to do work to make shared things should help to identify if it's worth doing as the costs of that should be apparent; this frames friction as a way of avoiding doing things that look easy or cost-free but aren't in the long term
- As a project lives longer, any fixed up-front creation time diminishes to a miniscule fraction of the overall time spent
- Continuous, long-term drift detection (and update assistance) is more powerful and useful than a fixed-time upfront bootstrap time saving for any project with a significant-enough lifetime
For my money, this is the key point that people miss.
A test I like to use for whether two things are actually or just incidentally related is to think about “if I repeat this, and then change one but not the other, what breaks?”
Often the answer is that something will break. If I repeat how a compound id “<foo>-<bar>” is constructed when I insert the key and lookup, if I change the insert to “<foo>::<bar>” but not the lookup, then I’m not going to be able to find anything. If I have some complicated domain logic I duplicate, and fix a bug in one place but not the other, then I’ve still got a bug but now probably harder to track down. In these cases the duplication has introduced risk. And I need to weigh that risk against the cost of introducing an abstraction.
If I have a unit test `insert(id=1234); item = fetch(id=1234); assert item is not nil`, if I change one id but not the other, the test will fail.
But if I have two separate unit tests, and both happen to use the same id 1234, if I change one but not the other, absolutely nothing breaks. They aren’t actually related, they’re just incidentally the same.
I really like this question as a way of figuring out whether things happen to look the same or actually should be the same for correctness, plus it feels like it should be an easy question to answer concretely without leading you down the path of 'Well we might need this as a common component in the future'.
I also think you can frame it as a same value or same identity type question.
https://www2.lawrence.edu/fast/ryckmant/On%20Sense%20and%20R...
And I think it's easy to see small companies lean on the duplication because it's too easy to screw up abstractions without more engineering heads involved to get it right sometimes.
That is basically the core tenet of "Write Everything Twice" (WET)
It's not easy to deduplicate after a few years have passed, and one copy had a bugfix, another got a refactoring improvement, and a third copy got a language modernization.
With poor abstractions, at least you can readily find all the places that the abstraction is used and imorove them. Whereas copy-paste-modified code can be hard to even find.
With duplicated messes you may be looking at years before a logical point to attack across the stack is even available because the team is duplicating and producing duplicated efforts on an ongoing basis. Every issue, every hotfix, every customer request, every semi-complete update, every deviation is putting pressure to produce and with duplication available as the quickest and possibly only method. And there are geological nuances to each copy and paste exercise that often have rippling effects…
The necessary abstractions often aren’t even immaturely conceived of. Domain understanding is buried under layers of incidental complexity. Superstition around troublesome components takes over decision making. And a few years of plugging the same dams with the same fingers drains and scares off proper IT talent. Up front savings transmutate to tech debt, with every incentive to every actor at every point to make the collective situation worse by repeating the same short term reasoning.
Learning to abstract and modularize properly is the underlying issue. Learn to express yourself in maintainable fashion, then Don’t Repeat Yourself.
https://invisible.college/toomim/toomim-linked-editing.pdf
> Abstractions can be costly, and it is often in a programmer’s best interest to leave code duplicated instead. Specifically, we have identified the following general costs of abstraction that lead programmers to duplicate code (supported by a literature survey, programmer interviews, and our own analysis). These costs apply to any abstraction mechanism based on named, parameterized definitions and uses, regardless of the language.
> 1. *Too much work to create.* In order to create a new programming abstraction from duplicated code, the programmer has to analyze the clones’ similarities and differences, research their uses in the context of the program, and design a name and sequence of named parameters that account for present and future instantiations and represent a meaningful “design concept” in the system. This research and reasoning is thought-intensive and time-consuming.
> 2. *Too much overhead after creation.* Each new programming abstraction adds textual and cognitive overhead: the abstraction’s interface must be declared, maintained, and kept consistent, and the program logic (now decoupled) must be traced through additional interfaces and locations to be understood and managed. In a case study, Balazinska et. al reported that the removal of clones from the JDK source code actually increased its overall size [4].
> 3. *Too hard to change.* It is hard to modify the structure of highly-abstracted code. Doing so requires changing abstraction definitions and all of their uses, and often necessitates re-ordering inheritance hierarchies and other restructuring, requiring a new round of testing to ensure correctness. Programmers may duplicate code instead of restructuring existing abstractions, or in order to reduce the risk of restructuring in the future.
> 4. *Too hard to understand.* Some instances of duplicated code are particularly difficult to abstract cleanly, e.g. because they have a complex set of differences to parameterize or do not represent a clear design concept in the system. Furthermore, abstractions themselves are cognitively difficult. To quote Green & Blackwell: “Thinking in abstract terms is difficult: it comes late in children, it comes late to adults as they learn a new domain of knowledge, and it comes late within any given discipline.” [20]
> 5. *Impossible to express.* A language might not support direct abstraction of some types of clones: for instance those differing only by types (float vs. double) or keywords (if vs. while) in Java. Or, organizational issues may prevent refactoring: the code may be fragile, “frozen”, private, performance-critical, affect a standardized interface, or introduce illegal binary couplings between modules [41].
> Programmers are stuck between a rock and hard place. Traditional abstractions can be too costly, causing rational programmers to duplicate code instead—but such code is viscous and prone to inconsistencies. Programmers need a flexible, lightweight tool to complement their other options.
If you don't have comprehensive test automation then you have to consider whether you can manually test all the places it is used. If the code is used in multiple products at your company--and you aren't even familiar with some of those products then you can't manually test all the places it is used. Under such circumstances it may be preferable for each team to have duplicate copies of some code. Not ideal, but practical.
It's unfortunate that so many people end up parroting fanciful ideas without fully appreciating the different contexts around software development.
Of course that's true of both sides of this discussion too.
I really value DRY, but of course I have seen cases where a little duplication is preferable. Lately I've seen a steady stream of these "duplication is ok" posts, and I worry that newer programmers will use it to justify copy-paste-modifying 20-30-line blocks of code without even trying to create an appropriate abstraction.
The reality of software is, as you suggest, that there are many good rules of thumb, but also lots of exceptions, and judgment is required in applying them.
It results in a brittle nightmare because you can no longer change any of it, because the responsibility of the refactored functions is simply "whatever the orignal code was doing before it was de-duplicated", and don't represent anything logical.
Then, if two places that had "duplicated" code before the refactoring need to start doing different things, the common functions get new options/parameters to cover the different use cases, until those get so huge that they start needing to get broken up too, and then the process repeats until you have a zillion functions called "process_foo" and "execute_bar", and nothing makes sense any more.
I've since become allergic to any sort of refactoring that feels like this kind of compression. All code needs to justify its existence, and it has to have an obvious name. It can't just be "do this common subset of what these 2 other places need to do". It's common sense, obviously, but I still have to explain it to people in code review. The tendency to want to "compress" your code seems to be strong, especially in more junior engineers.
But nobody really teaches the distinction between two passages that happen to have an identical implementation vs two passages that represent an identical concept, so they start aggressively DRY'ing up the former even though the practice is only really suited for the latter subset of them.
As you note, when you blindly de-duplicate code that's only identical by happenstance (which is a lot), it's only a matter of time before the concepts making them distinct in the first place start applying pressure for differentiation again and you end up with that nasty spaghetti splatter.
Even identical implementations might make more sense to be duplicated when throwing in variables around organizational coupling of different business groups and their change mgmt cycle/requirements.
Data, which is more important than code imho, are constantly duplicated all the time. Why can’t code have some duplication?
That said, I have found other areas of tech where duplication is very costly. If you are doing something like building a game, avoiding use of abstractions like prefabs and scriptable objects will turn into a monster situation. Failure to identify ways to manage common kinds of complexity across the project will result in rapid failure. I think this is what actually makes game dev so hard. You have to come up with some concrete domain model & tool chain pretty quickly that is also reasonably normalized or no one can collaborate effectively. The art quality will fall through the basement level if a designer has to touch the same NPC in 100+ places every time they iterate a concept.
I think the JS developers could take a lesson from the Go proverb. I often write something from scratch to avoid a dependency because of the overhead of maintaining dependencies (or dealing with dependencies that cease to be maintained). If I only need a half dozen lines of code, I'm not going to import a dependency with a couple hundred lines of code, including lots of features I don't need.
The "rule of three" helps avoid premature abstractions. Put the code directly in your project instead of in a library the first time. The second time, copy what you need. And the third time, figure out what's common between all the uses, and then build the abstraction that fits all the projects. The avoids over-optimizing on a single use case and refactoring/deprecating APIs that are already in use.
[1]: https://go-proverbs.github.io/ [2]: https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...
arealaccount•10h ago