I'm trying to think of a good example, maybe something like a pointer window based function (off the top of my head)
(This isn't real code. Don't get hung up on it)
func DedupeStrings(ss []string) []string {
if len(ss) < 2 {
return ss
}
ss = strings.Sort(ss)
u := 1 // index of end of "uniques" set
for i := 1; i < len(ss); i++ {
// Consume until new value
if ss[i] == ss[i-1] {
continue
}
// Put new value in 'uniques' set
ss[u] = sorted[i]
u++
}
// Final state: all unique items are positioned
// left of 'u' index
return ss[:u]
}
People will quibble, but- I'm not convinced you could change the variable names without harming clarity. Would a name like uniquesEndIndex really be any clearer? It adds noise to the code and still doesn't satisfy a confused reader
- I don't want to use function calls for documentation, eg putInUniques(). I'm doing it this way because I want it to run really quick.
Otherwise I agree, people have a weird hang up about short variable names. Somehow not a problem in mathematics...
I have slightly unorthodox opinions about short variables. I used to hate them. Then I posted a question on one of the PL design forums - it might have been Reddit r/programminglanguages - why is there are history of single letter names for type variables? ie T, U, etc for generics. The answer I got back, was, sometimes you want code to focus on structure rather than identities. That stuck with me, because it helped me understand why so much C code (including Linux) code uses similar naming practices. Names can lie, and sometimes expressing the structure is the absolute critical thing.
The specificity or abstractness of a (variable) name relates to the values that it can hold. So when you have a very abstract function whose inputs can be of almost any type, naming those inputs in an overly-specific manner is an exact inverse of the failure of giving an overly generic to name highly constrained parameter.
Examples of correct naming:
func firstWord(s string) string { ... }
func bidShortcodePrefix(businessId string) string { ... }
Examples of incorrect naming: func firstWord(strWithOptionalSpaces string) string { ... }
func bidShortcodePrefix(s string) string { ... }
All this said, I do agree with your original take on the comments. I much prefer having human-readable explanations inline with anyhow non-trivial code. If nothing else, they really make it easier to correctly fix the code if a bug is found much later.Comments should answer the question why you are not using some kind of hash set and do a single pass over the data and why it's OK to reorder the strings. One could reasonable expect that Dedupe shows first occurrences in order.
Those comments also don't really help me quickly understand the code. I'd do a small doc comment along the lines of "Removes repeated elements from the supplied list, returning the remaining items in original order"
If that needs a why it’s a why-comment.
If it needs a what it’s a what-comment. Especially if clever programming tricks are used. 6 month later you already forgot what the trick is and how it works.
The "why" is the part of the explanation that can't be deduced from the code.
https://en.wikipedia.org/wiki/Fast_inverse_square_root
That definitely needs a what comment.
But there's also the case to be made that the comments that particular code needs are "why" comments. I can see what happens, but why does it work?
The article mentions the reason for the code which I would expect in the commit message
Replace symbol placeholders in the input string with translated values. Scan the string for symbol placeholders that use the format "$foo". The format uses a dollar sign, then optional ASCII letter, then optional word characters. Each recognized symbol is replaced with its corresponding value.
Symbols are only replaced if the symbol exists i.e. getSymbol(String) returns non-null, and the symbol has not already been replaced in this invocation.
Example:
- input = "Hello $name, welcome to $city!"
- output -> "Hello Alice, welcome to Boston!"
Return the string with symbol placeholders replaced.A single line comment is easy to parse, read and spot as having to be changed when you patch something.
I build accounting software and half my "what" comments are actually explaining business rules that would be impenetrable otherwise. Something like:
// Bank transfers appear as two transactions - a debit here and credit elsewhere
// We match them by looking for equal-opposite amounts within a 3-day window
That's explaining "what" but also implicitly "why" - because that's how double-entry works and that's the tolerance banks allow for settlement delays. You can't really extract that into a method name without it becoming absurd.The Uncle Bob approach of extractNameMatchingTransfersWithinSettlementWindow() doesn't actually help - now I need to know what settlement windows are anyway, and I've lost the context of why 3 days.
Or to put it another way: to provide the necessary context for figuring out why a particular piece of code is written the way it's done. (Or what it's supposed to do.)
This is missing from your comment. While I'd probably be able to pick that up from context, I'd have to hold "this is an assumption I'm making" in my head, and test that assumption against all parts of the codebase I encounter until I become certain: this'd slow me down. I'd recommend including the "why" explicitly.
It seems like the caller only needs to know that it’s checking within the window, which is then likely a constant (like SETTLEMENT_WINDOW_DAYS) that gives the context.
If you need to, you can add a comment to the constant linking to whatever documentation defines why three days.
That's why I've also started to explicitly decompose constants if possible. Something like `ageAlertThresholdHours = backupIntervalHours + approxBackupDurationHours + wiggleRoomHours`. Sure, this takes 4 constants and is longer to type than "28 hours".
However, it communicates how I think about this, how these 28 hours come about and how to tweak them if the alerting is being annoying: It's easy to guess that you'd bump that to 29 hours if it throws false positives. But like this, you could see that the backup is supposed to take, say, 3 hours - except now it takes 4 due to growth of the system. So like this, you can now apply an "obviously correct" fix of bumping up the approximate backup duration based on monitoring data.
// a total of
// - backup interval = 24
// - approx backup duration = 2
// - “wiggle room” = 2
ageAlertThresholdHours = 28
yes lazy devs are lazy and won’t want to or just won’t update the comments (be pedantic in review :shrug:). it’s all trading one thing off with another at the end of the day.edit — also your version forces me to read horizontal rather than vertical, which takes longer ime.
sorry, i’ve basically done an unprompted code review. i feel like a bit of a dick now.
const int backupIntervalHours = 24
const int approxBackupDurationHours = 2
const int wiggleRoomHours = 2
const int ageAlertThresholdHours = backupIntervalHours + approxBackupDurationHours + wiggleRoomHours;
static_assert(28 == ageAlertThresholdHours);
It's a shame more languages don't have static asserts... faking it with mismatched dimensions of array literal/duplicate keys in map literals is way too ugly and distracting from the intent. ageAlertThresholdHours = 24 + // backup interval
2 + // approx backup duration
2; // "wiggle room"
No static assert needed, no need to pre-compute the total the first time, and no need to use identifiers like `approxBackupDurationHours`, the cognitive override about the possibility of colliding with other stuff that's in scope, or the superfluous/verbose variable declaration preamble.For example:
local
val backupIntervalHours = 24
val approxBackupDurationHours = 2
val wiggleRoomHours = 2
in
val ageAlertThresholdHours = backupIntervalHours + approxBackupDurationHours + wiggleRoomHours
end
Then it's easier to document what components a constant is composed of using code without introducing unnecessary bindings in the scope of the relevant variable. Sure constants are just data, but the first questions that pops into my head when seeing something in unfamiliar code is "What is the purpose of this?", and the smaller the scope, the faster it can be discarded.I often write things the way you have done it, for the simple reason that, when writing the code, maybe I feel that I might have more than one use for the constant, and I'm used to thinking algebraically.
Except, that I might make them global, at the top of a module. Why? Because they encode assumptions that might be useful to know at a glance.
And I probably wouldn't go back and remove the constants once they were named.
But I also have no problem with unnamed but commented constants like the ones in the comment you responded to.
That's all fine.
Just note that this was one of the easiest examples I could find. For example, for reasons out of my control, the individual network configuration on a linux host is positively nuts. The decision whether to deploy routes, static DNS servers and such depends on 3-5 facts about the datacenter and the provider it's on.
In such a case, it is more maintainable to separate the facts about the provider, or the thing we are looking at (e.g. "Does this provider allow us to configure routes in their DHCP server?", from the computation/decision making ("Can the system rely on the routes from the DHCP servers?"), and all of that from the actual action based off of the decision ("Deploy routes statically if DHCP provided routes are not correct").
The comment also leaves me with more questions: how do you handle multiple identical amounts in that window? I would still have to read the implementation to be sure.
I would prefer encoding this in an Uncle Bob style test. It acts as living documentation that cannot get out of sync with the code and explains the why through execution. For example:
test("should_match_debit_with_credit_only_if_within_three_day_settlement_window", () => {
const debit = A_Transaction().withAmount(500.00).asDebit().on(JANUARY_1);
const creditWithinWindow = A_Transaction().withAmount(500.00).asCredit().on(JANUARY_4);
const creditOutsideWindow = A_Transaction().withAmount(500.00).asCredit().on(JANUARY_5);
expect(Reconciliation.tryMatch(debit, creditWithinWindow)).isSuccessful();
expect(Reconciliation.tryMatch(debit, creditOutsideWindow)).hasFailed();
});
This way, the 3 day rule is a hard requirement that fails the build if broken rather than a suggestion in a comment block.Ultimately disciplined practice requirements rely on disciplined practices to succeed. You can move the place where the diligence needs to taken, but at the end the idea that comments can lose their meaning isn't that different to other non-functional, communicative elements also being subject to neglect. I don't mean to suggest that a longish test title wouldn't be more likely to be maintained, but even with that long name you are losing some context that is better expressed, person-to-person, using sentences or even paragraphs.
I had first hand experiences with this, oddly enough, also working on an accounting system (OK, ERP system... working the accounting bits). I was hired to bring certain deficient/neglected accounting methodologies up to a reasonable standard and implement a specialized inventory tracking capability. But the system was 20 years old and the original people behind the design and development had left the building. I have a pretty strong understanding of accounting and inventory management practices, and ERP norms in general, but there were instances where the what the system was doing didn't make sense, but there was no explanations (i.e. comments) as to why those choices had been taken. The accounting rules written in code were easier to deal with, but when we got to certain record life-cycle decisions, like the life-cycle evolution of credit memo transactions, the domain begins to shift from, "what does this mean from an accounting perspective", where generally accepted rules are likely to apply, to what did the designers of the application have in mind related to the controls of the application. Sure I could see what the code did and could duplicate it, but I couldn't understand it... not without doing a significant amount of detective work and speculation. The software developers that worked for the company that made this software were in the same boat: they had no idea why certain decisions were taken, if those decisions were right or wrong, or the consequences of changes... nor did they care really (part of the reason I was brought in). Even an out-of-date comment, one that didn't reflect the code has it evolved, would still have provided insight into the original intents. I know as well as you that code comments are often neglected as things change and I don't take it for granted that understanding the comments are sufficient for knowing what a piece of code does.... but understanding the mind of the author or last authors does have value and would have helped multiple times during that project.
When I see these kinds of discussions I'm always reminded of one of my favorite papers. By Peter Naur, "Programming As Theory Building" (https://pages.cs.wisc.edu/~remzi/Naur.pdf). In my mind, comments that were at least good and right at one time can give me a sense of the theory behind the application, even if they cannot tell me exactly how things work today.
For example, at my last job, we shoved all constants managed by the balancing teams into a static class called BalancingTeam, to make it obvious that these values are not in our control. Tests, if (big-if) written, should revolve around the constants to not be brittle.
However, I’d argue that being 'out of our control' is actually the main reason to test them. We treat these as acceptance tests. The goal isn't flexibility, it is safety. If a PR changes a regulated value, the test failure acts as a tripwire. It forces us to confirm with the PM (and the Jira ticket) that the change is intentional before merging. It catches what code structure alone might miss.
Yes, comments can drift out of sync with the code. That doesn't mean the comments were a bad idea; it just means the programmer didn't update the comments when they should have.
Similarly, even with zero comments, variable names can drift out of sync with their intended purpose, e.g. when they are repurposed to hold values that are different from what their name implies. Again, the solution is to rename the variable so it reflects its purpose (or possibly create another variable).
> This way, the 3 day rule is a hard requirement that fails the build if broken rather than a suggestion in a comment block.
What happens when a dev changes the test code, but fails to update the string "should_match_debit_with_credit_only_if_within_three_day_settlement_window"? That's effectively no different than comments being out of sync with the code.
they can also be a good defence against newer seniors on the team refactoring willy nilly — can’t argue with business rules, but can argue over uncle bob. ;)
Your comment is a poor example because you unwittingly wrote "why" comments, not "what".
// We match them by looking for equal-opposite amounts within a 3-day window because bank transfers appear as two transactions - a debit here and credit elsewhere
If anything, you proved the importance of clarifying why things must be the way they are.> The Uncle Bob approach of extractNameMatchingTransfersWithinSettlementWindow() doesn't actually help - now I need to know what settlement windows are anyway, and I've lost the context of why 3 days.
You are also completely off in here as well, and I think that your comment shows a deeper misunderstanding of the topic. What you tried to frame as "Uncle Bob approach" actually means implementing your domain in a clear and unambiguous way so that both the "what" and the "why" are clear by looking at the code. Naming conventions are one aspect, but the biggest trait is specifying an interface that forces the caller no option other than doing the what's because of the why's.
Without looking too hard into it, whipping out something like this in a few seconds would match the intent
GetMatchingTransfers(TransferAmount, SettlementWindow)The line between the two is not that blurry: assume your reader has total knowledge of programming, and no knowledge whatsoever of the outside world. Comments about what the code does to bits are the "what"; comments about how the code relates to the outside world are the "why". The rest is a matter of taste and judgment.
"assume your reader has total knowledge of programming"
Because if I know my fellow programmers have like me not a total knowledge of programming, what comments before footguns seem useful to me.
Or when there was a hack that is not obvious.
To me it mostly is not a question of taste, but context. Who will read the code?
In my experience, when I review junior contributors’ code and see “what” comments, it’s usually caused by 1) bad naming, or 2) abstractions that don’t make sense, or 3) someone trying to reinvent maths but incorrectly, or 4) missing tests, or 5) issues with requirement gathering / problem specification, or 6) outright laziness where the contributor doesn’t want to take the time to really think things through, or 7) unclear and overcomplicated code, or… any number of similar things.
At the very least, any time you see a “what” comment, it’s valuable to take notice and try really hard to think about whether the problem the comment tries to solve has a better solution. Take it as a personal challenge.
Like something really bad
x=y //triggers method xyz
So I would agree that under controlled conditions, they should not be necessary.
Gives a good idea of what but not why and if anyone considers that method name too long they can get back in their cave, tbh.
Yes, there are multiples levels of knowledge, and the required level of commenting depends on the minimum knowledge expected of a reader in each of several dimensions.
In point of fact, the only thing that almost always can do without documentation is a different question, "How."
If the comment was
// We match them by looking for equal-opposite amounts within an X-day window defined by BANK_TRANSFER_MATCH_WINDOW_DAYS
the comment is more evergreen, until the actual logic changes. If/when the logic changes... update the comment?Sure it does. You've isolated the "what" to the name, and the comments can focus on the "why".
(Although I think some of the XP advocates would go off the rails and try to model a settlement window as an object....)
When the name is long, it should be more than one function. Finding the transactions within a settlement window seems distinct from matching them up.
The comment about 3 days isn't necessary when "numDays" is a variable in the SettlementWindow class (or other relevant scope). I'd be a lot more comfortable reading code that makes this concept a proper abstraction rather than just seeing a comment about a hard-coded 3 somewhere. Magic values are bad and comments don't demystify anything if they get out of sync with the code.
Comments should explain. If it’s why, it’s why. If it’s what, it’s what. The point of comments is to explain, to the reader/editor, what the intent of the code is and how you chose to approach it. That’s it. No need to fight over what, why, how, who, etc.
If you have domain knowledge that helps support the code, maybe a link to further reading, by all means add it to comments please!
Change my mind!
Names capture ideas. Only if we name something can we (or at least I) reason about it. The more clear and descriptive a name for something is, the less cognitive load is required to include the thing in that reasoning.
TFA's example that "weight" is a better variable name than "w" is because "weight" immediately has a meaning while use of "w" requires me to carry around the cumbersome "w is weight" whenever I see or think about "w".
Function names serve the same purpose as variable names but for operations instead of data.
Of course, with naming, context matters and defining functions adds lines of code which adds complexity. As does defining overly verbose variable names: "the_weight_of_the_red_ball" instead of "weight". So, some balance that takes into account the context is needed and perhaps there is some art in finding that balance.
Comments, then, provide a useful intermediate on a spectrum between function-heavy "Uncle Bob" style and function-less "stream of consciousness" style.
Even in stupidest cases, like:
// add two and two
let four = two + two
The rationale for this is that if you start to mute developer on case of "this is not comment-worthy" you start to actually losing context for the codebase. Sure, maybe "add two and two" is not contextful enough, but "add same account twice" might be signal for appropriate code.Maybe that's me, but I rarely saw teams which over-document, under-documenting is usually the case. So if we ever meet in professional environment - comment away. Worst case scenario - I'll skim over it.
This is a good point, although this recently changed with LLMs, which often spit out a ton of redundant comments by default.
If you have the spare time, you can try and submit your patches upstream; in the meantime, you just maintain your own version.
While it leads to more things to read, which thus may take time, I feel that the benefits of using comments far outweighs the negative sides. This is even valid when comments are outdated usually. To me, adjusting and updating comments often was much easier and faster than describing something de-novo.
In the ruby land this is quite problematic because many do not use any comments. The result is a horrible code base. Getting people who wrote that code to use comments is often too late, as they already abandoned ruby in favour of another language, so I never buy the cop-out explanation of "the code is self-explanatory" - not even in ruby it is. Everyone can write horrible code.
For example, for some reason padding bytes are allowed in variable length integers (LEB128) for reasons I still do not understand:
// Allow padding bytes that do not affect the value
let expectedBits: UInt8 = (result < 0) ? 0x7F : 0x00He starts implementing any module or function by first writing the documentation for it and let's it guide both the functionality and structure.
Makes Redis also extremely easy and enjoyable to read.
Random example: https://github.com/redis/redis/blob/unstable/src/aof.c
Now I sometimes use this practice when working with agents, if I need something done just a certain way. It's time consuming, but it produces good results.
There might be a better one that also takes into account whether the code does something weird or unexpected for the reader (like the duplicate clear call from the article).
Same goes for comments vs commit messages. It's a fact comments get outdated and then you have an even bigger problem whereas a commit message is always correct at the time it was made. But obviously again, no hard rules. Sometimes I feel it's better to write a comment, e.g. if it's something that is really important and won't change a lot.
I used to comment in a similar way to claude/chatgpt, as in both the what and why. (although the why in LLMs is often lost. ) I used to comment as if it was for someone who didn't know the libraries very well. (ie me in two years time. Documentation at the previous place was utterly shite, so it was effectively a journal of discovery)
However, my commenting style is both what and why. My variable names can be longer than my FANG peers, mainly because I know that English not being a first language means that half arsed abbreviations are difficult to parse.
But effort is placed on not using fancy features, python mostly, so avoiding lambdas and other "syntactic sugar", unless they absolutely make sense. If I do use them, i have a comment saying _why_ I'm doing it.
Some of my colleagues were dead against any kind of commenting. "my code should be readable enough without it". They had the luxury of working on one bit for a few months, then throwing it away.
It makes it easier for dev's brain to parse the code e.g. to understand what code really does, while fattier but commented version makes it harder but tries to replace it with information about original coder's intentions. Which is maybe important too but not as important as code itself.
Not to forget that it's too easy to change code and leave comments obsolete.
var hasSymbol = getSymbol(symbolName) != null
var replacementPending = !alreadyReplaced.contains(symbolName)
if(hasSymbol && replacementPending){
alreadyReplaced.add(symbolName);
stringToReplace = stringToReplace.replace("$" + symbolName, translate(symbolName));
}
Technically this performs worse because you lose short-circuiting, but in performance-sensitive contexts code styling is less a concern anyways. And I also wouldn't rely on short-cutting alone to avoid a really expensive operation or side-effect: at some point someone will fail to notice it. /* Symbol actually exists */
if ((NULL != getSymbol (symbolName)
/* and still to be added */
&& (!alreadyReplaced.contains (symbolName))
{
...
Although in this specific case the comments seem like noise to me.> Technically this performs worse because you lose short-circuiting
Not really, because optimizing compilers are a thing, when this thing is parsed into SSA, there won't be a difference.
I just tested a recent gcc at -O2 with a contrived example using strings in an unordered_set: a look-up always occurs if not relying on short-circuiting
Had to add the last sentence for the circa 2020s developer experience. LLM comments are almost never useful since they're supposed to convey meaningful information to another human coder, anything your human brain can think of will probably be more helpful context.
Bad comments aren’t just less helpful than possible, they’re often harmful.
Once you’ve hit a couple misleading comments in a code base (ie not updated, flatly wrong, or deeply confusing due to term misuse), the calculus swings to actively ignoring those lying lies and reading the code directly. And the kind of mind resorting to prose when struggling to clarify programming frequently struggles with both maintenance and clarity of prose.
If there are no comments, you are reading the code (or some relatively far away document) for all understanding anyway. If there are inaccurate comments, worst case you're in the same boat except maybe proceeding with a bit more caution next comment you come across. I always ask of fellow engineers: why is it unduly difficult to also fix/change the comments as you alter the code they refer to? How and when to use comments is a balancing of trade-offs: potential maintenance burden in the future if next writers are lazy vs. opportunity to share critical context nearest its subject in a place where the reader is most likely to encounter it.
// translate will replace all instances; only need to run it once
This is a "why". // Replace all symbols
This is a "what". It's better conveyed by improving the function name rather than by a comment: String replaceAllSymbols() {
Ultimately this article buttresses conventional wisdom about comments.- clean code one, for me it just reads easier, specific bits of the larger operation are not surrounded with noise coming from other bits like it is in the commenting "what" one. I can focus more on each step, and that can make it easier to spot a bug or to refactor/adjust/extend, as now I'm more like on a given page of Lego set instructions
- the commenting of "what" one - yeah, this obviously makes it quicker (but probably not easier) to see the larger picture. It has benefits, but I believe this helps more hacker/scripter kind of people than programmers
Here is one I wrote just to talk about iterating a loop in reverse:
/*
* We iterate the v6 prefixes in reverse from longest prefix length
* to shortest. This is because the ipv6 address is a sequence of bytes,
* and we want to perturb the address iteratively to get the corresponding
* network address without making a copy for each perturbation as that
* would be expensive.
*
* For example take address: abcd:abcd:abcd:abcd:abcd:abcd:abcd:abcd.
*
* With masks /112, /64, /8 we want to create the following network addresses
* to lookup as follows:
*
* Lookup abcd:abcd:abcd:abcd:abcd:abcd:abcd:0000 in /112 bucket
* Lookup abcd:abcd:abcd:abcd:0000:0000:0000:0000 in /64 bucket
* Lookup abcd:0000:0000:0000:0000:0000:0000:0000 in /8 bucket
*
* In any other order aside from most specific to least, we'd have
* to create copies of the original address and apply the mask each
* time to get each network address; whereas in this case we can take
* the same address and clear lower bits to higher bits as we go from
* most specific to least specific masks without incurring any copies.
*/
for (auto it = m_v6_prefixes.crbegin(); it != m_v6_prefixes.crend(); ++it)
Or here is another for masking a v4 address, but also explaining why a uint64 is used (this is calculated in a hot loop [same as the previous comment example], so I felt it was imperative to explain what is going on as there is very little room otherwise to optimise): for (const auto & [ mask_len, bucket ] : m_v4_prefixes)
{
/*
* Example:
*
* netmask 255.255.128.0 (/17) or 0xffff8000:
*
* 0xffffffff ffffffff << (32-17)
* --------------------------------
* 0xffffffff ffff8000 (shifted left 15)
*
* Applying against address 192.168.85.146 or 0xc0a85592:
*
* (converted to uint64_t due to implicit integer promotion)
*
* 0x00000000 c0a85592
* &
* 0xffffffff ffff8000
* -------------------
* 0x00000000 c0a80000
* -------------------
* 0xc0a80000 (after conversion to uint32_t causing
* lower 32-bit truncation)
*/
std::uint64_t mask = (~0ULL) << (32 - mask_len);
std::uint32_t network_addr = addr & mask;
const auto itt = bucket.find(network_addr);Was it done because shifting by amount equal or greater to integer width is undefined behavior? That would still not require storing result in 64bit mask, just shifting (~0ULL) would be enough. That would be a lot more valuable to explain than how bitwise AND works.
The first one also seems slightly sketchy but without knowing rest of details it's hard to be sure. IPV6 address is 128bits, that's 2 registers worth integers. Calculating base address would take 2 bitwise instruction. Cost of copying them in most cases would be negligible compared to doing the lookup in whatever containers you are searching resulting address. If you are storing it as dynamically allocated byte arrays (which would make copying non trivial) and processing it in such a hot loop where it matters, then seems like you have much bigger problems.
For my taste it would be sufficient to say "Iterate in reverse order from most specific address to least specific. That way address can be a calculated in place by incrementally clearing lowest bits." Having 2 paragraphs of text which repeat the same idea in different words is more distracting than it helps.
As for the 128-bit addresses, we used boost ip::address_v6 to_bytes() as it appears there was no masking option.
For my taste it would be sufficient to say "Iterate in reverse order from most specific address to least specific. That way address can be a calculated in place by incrementally clearing lowest bits." Having 2 paragraphs of text which repeat the same idea in different words is more distracting than it helps.
Ah apologies, too late now, should've mentioned it in the PR. But I expected it would ruffle some feathers, I don't care for conventions or other people's quibbles. As long as it improves understanding for the reader, regardless of how "redundant", then mission accomplished.
applyDrag(): void {
const { quad: quadConfig } = settings
const quad = this.getRigidBody()
const quadVel = vec3ToTwgl(quad.linvel())
const dragMag = aerodynamicDrag(quadConfig.dragCoefficient, v3.length(quadVel), quadConfig.frontalArea)
const dragDir = v3.negate(v3.normalize(quadVel))
const dragForce = v3.mulScalar(dragDir, dragMag)
const dragImpulse = v3.mulScalar(dragForce, dt)
quad.applyImpulse(vec3TwglToRapier(dragImpulse), true)
}
This way code gets more natural language anchors which helps understanding what it does.I see this a lot in the wild, though - as an honest question (not trolling!) why do people still shorten their variable names in place of having a terse descriptor ?
(fn some-function [args]
(let [binding (transform args)]
(an-expression binding)))
I find that this makes skimming lisp code much easier, because I can usually skip reading the bindings and just read the function name and ultimate expression and usually get the gist very quickly.You might wonder how this is different than the example you provided, and the answer is because you could sneakily intersperse anything you wanted between your imperative bindings (like a conditional return statement), so I actually have to read every line of your code, vs in a lisp `let` I know for a fact there is nothing sneaky going on.
When I read the replace() method, I was immediately confused because it has no arguments. stringToReplace and alreadyReplaced are properties of the class, so you must look elsewhere (meaning, outside of the function) for their definitions. You also don't know what other bits of the class are doing with those properties when you're not looking. Both of these facts inflate the context you have to carry around in your head.
Why is this a class? In the replace() method, there is a call to translate(symbolName). Why isn't there also a SymbolNameTranslator class with a translate() method? Who decided one was simple enough to use a function while the other warrants a class?
SymbolReplacer surely could also be done with a function. I understand that this is illustration code, so the usage and purpose is not clear (and the original Bob Martin article does not help). Is there a reason we want these half-replaced strings made available every time we call SymbolReplacer.replace()? If there is, we can get the same data by using a reduce function and returning all of the iterations in a list.
A plain, immutable function necessarily contains within it the entire scope of its behavior. It accepts and returns plain data that has no baggage attached to it about the data's purpose. It does one thing only.
a=$b
b=oops
if your input string just has one of these, it will just be translated once as the programmer was probably expecting: input: foo $a bar
output: foo $b bar
but if your input string first references $b later, then it will recursively translate $a. input: foo $a bar $b
output: foo oops bar oops
Sometimes translating recursively is a bizarre behavior and possibly a security hole.The sane thing would be to loop through building the output string, adding the replacement for each symbol as you go. Using String.replace and the alreadyReplaced map is just a bad idea. Also inefficient, as it and throws away strings and does a redundant search on each loop iteration.
Feels typical of this whole '90s-era culture of arguing over refactoring with Java design patterns and ornate styles without ever thinking about if the algorithm is any good.
Edit: also, consider $foo $foobar. It doesn't properly tokenize on replacement, so this will also be wrong.
As follows:
// SYMBOL_REF should be a class-level static final to avoid recompiling on each call.
int pos = 0; // input[..pos] has been processed.
StringBuilder out = new StringBuilder(); // could also guess at length here.
Matcher m = SYMBOL_REF.matcher(input);
while (m.find()) {
String replacement = symbols.get(m.group(1));
if (replacement == null) {
continue; // no such symbol; keep literal `$foo`.
}
out.append(input, pos, m.start());
out.append(replacement);
pos = m.end();
}
out.append(input, pos, input.length());
(Apparently there's also now a Matcher.replaceAll one could use, but it's arguably cheating to outsource the loop to a method that probably didn't exist when the Uncle Bob version was written, and it's slightly less efficient in the "no such symbol" case.)Coding style must serve the purpose of aiding understanding. If you have strong opinions about the coding style of `replace` but those opinions don't lead to recognition that the implementation was incorrect and inefficient, your opinions are bad and you should feel bad. Stop writing garbage books and blog posts!
</rant>
Only comments should explain what, variable names should only hint
The first example is perfectly fine, nobody has the time to derive or read a verbose formula involving the words 'weight', 'radius' and 'price'.
But I don’t think they’re worth it. My issue with “what” comments is they’re brittle and can easily go out of sync with the code they’re describing. There’s no automation like type checking or unit tests that can enforce that comments stay accurate to the code they describe. Maybe LLMs can check this but in my experience they miss a lot of things.
When “what” comments go out of sync with the code, they spread misinformation and confusion. This is worse than no comments at all so I don’t think they’re worth it.
“Why” comments tend to be more stable and orthogonal to the implementation.
In my early days I read a lot of "you should" "this is wrong". But code is an expressive medium, you can do things one way, or do it the other, you can write "amountOfEmployees" or you can write "ne" with an "#amount of employees" comment, either way is absolutely fine and you can use whichever depending on your priorities, tradeoffs or even your mood.
Also I used to obsess over code, (and there's a lot of material that obsesses about code), but after you become profficient at it, there's a cap on the returns of investing time into your codebase, and you start to focus on the product itself. There's not much difference between a good codebase and a marvelously polished codebase, so you might as well use the extra focuse on going from a bad UX to a neutral UX or whatever improvement you can make to the product.
Of course, be sensible and use good judgement :-)
But, that's what we're paid for, right?
There have been so many times where I have commented _why_ I think some uncle bob-ism made the code unclear and the response is always:
> *Sends link to Clean Code, maybe you don't know about this?
No, I do, and I am allowed to disagree with it. To which they always clutch their pearls "How do you think you know better than uncle bob!?", this is a Well Established Pattern TM.
I don't think I know better than Uncle Bob, but I don't think Uncle Bob works on this codebase nearly as much as you or I.
"When" is occasionally a good question to answer in a comment, e.g. for an interrupt handler, and "Where" is also occasionally a good thing to answer, e.g. "this code is only executed on ARM systems."
The other three questions typically form a hierarchy: Why -> What -> How.
A simplistic google shows that "code comments" are next to "what" and "how" at about the same frequency as they are next to "why" and "what."
This makes some amount of sense, when you consider the usual context. "Why" is often an (assumed) obvious unstated business reason, "What" is a thing done in support of that reason, and "How" is the mechanics of doing the thing.
But with multiple levels of abstraction, _maybe_ the "What" inside the hierarchy remains a "What" to the level above it, but becomes a "Why" to the next level of "What" in the hierarchy. Or maybe the "How" at the end of the hierarchy remains a "How" to the level above it but becomes a "What" to a new "How" level below it.
Is it:
Why -> What/Why -> What/Why -> What/Why -> What -> How
or
Why -> What -> How/What -> How/What -> How/What -> How
In many cases the intermediate nodes in this graph could be legitimately viewed as either What/Why or as How/What, depending on your viewpoint, which could partly depend on which code you read first.
In any case, there are a few hierarchies with final "Hows" that absolutely beg for comments (Carmack's famous inverse square root comes to mind) but in most problem domains that don't involve knowledge across system boundaries (e.g. cache optimization, atomic operations, etc.), the final "How" is almost always adequately explained by the code, _if_ the reader understands the immediately preceding "What."
If I see a function "BackupDatabase()" then I'm pretty sure I already know both "Why" and "What" at the highest levels. "How" I backup the database might be obvious once I am reading inside the function, or the code might be opaque enough that a particular set of lines requires explanation. You could view that explanation as part of "How" the database is backed up, or you could view that explanation as "What" the next few lines of code are doing.
Again, this viewpoint might even partly depend on where you started. If you are dumped inside the function by a debugger, your question might be "What the heck is this code doing here?" but if you are reading the code semi-linearly in an editor, you might wonder "How is BackupDatabase() implemented?"
If I look through code and see a variable I don't know, I want to see its definition anyway, so I know the type, scope, initial value, etc. And it's trivially possible to do that with everything that has a "jump to definition" command.
I hope not of a lot of the future folks hate me for leaving them with ample context and clear/dead simple code.
In a very real way, a codebase is a conversation among developers. There's every chance the next guy will be less familiar with the code than you, who just did a deep dive and arrived at some critical, poorly documented line of code. There's an even better chance that person will be you, 6 months from now.
Use opportunities to communicate.
I have no idea why people want to "save time" to write short comments and short variable names. I just CTRL+C, CTRL+V my variable name anyway. They compound on each other, and that ends up adding an unnecessary level of complexity, and the possibility for errors.
When I come back to a piece of complex code after a year or two, I'm very, very happy that I've used a variable name like "three_tuple_of_weight_radius_price" instead of "tuple" or "t".
I do think very long names can hurt readability though so it's still a balancing act, but variables like "t" (often used in Haskell, ugh) is awful in my opinion.
I absolutely going to add the type to the variable name if it’s a complex function. It’s just clearer.
If your weights need to be, say, a floating-point number of kilograms, then you establish that convention, and only mark intentional deviations from the convention (e.g. because you're doing an explicit conversion to format output). (Or you use a library like Pint to give more formality to it.)
If I’m adding a feature to the site in 5 years, it’s going to be important to know if I’ve done that or not.
Really long variable names put extra space between the operators and calls, which makes it harder to grok the logical structure of the expression. When I write `a + b` in code, it's in that order because the language demands it, and I use languages with that syntax for reasons of familiarity; but really, the `+` is the most important thing there, so it shouldn't be buried way off to the right.
I don't like for variable names to mark type. `weight_radius_price` is fine (and usually much better than `wrp`, and definitely much better than `tuple` or `t`) but a) we know it's a 3-tuple because we can see the right-hand-side of the equals sign; b) the structure of the name already conventionally tells us to expect more or less that (although maybe a class or structure abstraction is missing?); c) the choice of a tuple is probably not really relevant in context as compared to other ways of sticking three atomic values together.
In the context I’m referring to, the variable may have been set 80 lines earlier, 7 years ago, when I was worse at coding.
The point of the redundancy of very clear, descriptive variable names, is that the code is going to suck tomorrow, even if it is good today. Future me is always going to look down on today me. The difference is that I planing for that, so maybe I can help out future me by being verbose, even if it isn’t necessary for much of the code that ends up being fine.
When I have a nasty bug, I don’t want to waste a day on something that could have been clear if I’d just explained it properly.
That comment is not helpful, because it's wrong. The translate() function just is some sort of lookup. It doesn't replace anything. It's the stringToReplace.replace() call that replaces all instances.
I find that ends up being succinct and useful in the future.
Thought-provoking article, nonetheless!
awesan•22h ago
I don't know how that book ever got as big as it did, all you have to do is try it to know that it's very annoying and does not help readability at all.
ngruhn•22h ago
embedding-shape•22h ago
Never heard of "that style of programming" before, and I certainly know that Uncle Bob never adviced people to break down their programs so each line has it's own method/function. Are you perhaps mixing this with someone else?
troupo•22h ago
There's a literal link to a literal Uncle Bob post by the literal Uncle Bob from which the code has been taken verbatim.
eterm•22h ago
In his own words, ( page 34 ):
> [functions] should be small. They should be smaller than that. That is not an assertion I can justify.
He then advocates for functions to be 2-3 lines each.
embedding-shape•22h ago
That's both true, but long way away from "every line should have it's own method", but I guess parent exaggerated for effect and I misunderstood them, I took it literally when I shouldn't.
eterm•21h ago
zephen•15h ago
xlii•22h ago
Haskell (and OCaml I suppose two) are outliers though as one is supposed to have a small functions for single case. It's also super easy to find them and haskell-language-server can even suggest which functions you want based on signatures you have.
But in other languages I agree - it's abomination and actually hurt developers with lower working memory (e.g. neuroatypical ones).
danielscrubs•20h ago
Scarblac•22h ago
I have had so many discussions about that style where I tried to argue it wasn't actually simpler and the other side just pointed at the book.
__s•21h ago
hamdingers•18h ago
One of the most infuriating categories of engineers to work with is the one who's always citing books in code review. It's effectively effort amplification as a defense mechanism, now instead of having a discussion with you I have to go read a book first. No thanks.
I do not give a shit that this practice is in a book written by some well respected whoever, if you can't explain why you think it applies here then I'm not going to approve your PR.
rasmus-kirk•12h ago
bluecalm•17h ago
Scarblac•17h ago
Then replaced a pretty simple one with an anonymous immediately invoked function that contained a switch statement with a return for each case.
Um, can I have a linter rule against that?
zahlman•9h ago
Scarblac•2h ago
Basically it's a shame that Typescript doesn't have a switch-style construct that is an expression.
And that nowadays you can't make nested ternaries look obvious with formatting because automated formatters (that are great) undo it.
zahlman•9h ago
IMX, people mainly defend goto in C because of memory management and other forms of resource-acquisition/cleanup problems. But really it comes across to me that they just don't want to pay more function-call overhead (risk the compiler not inlining things). Otherwise you can easily have patterns like:
(I prefer for handling NULL to be the cleanup function's responsibility, as with `free()`.)Maybe sometimes you'd inline the two acquisitions; since all the business logic is elsewhere (in `do_thing_with`), the cleanup stuff is simple enough that you don't really benefit from using `goto` to express it.
In the really interesting cases, `do_thing_with` could be a passed-in function pointer:
And then you only write that pattern once for all the functions that need the resources.Of course, this is a contrived example, but the common uses I've seen do seem to be fairly similar. Yeah, people sometimes don't like this kind of pattern because `cleanup_a` appears twice — so don't go crazy with it. But I really think that `result = 2; goto a_cleanup;` (and introducing that label) is not better than `cleanup_a(a); return 2;`. Only at three or four steps of resource acquisition does that really save any effort, and that's a code smell anyway.
(And, of course, in C++ you get all the nice RAII idioms instead.)
bluecalm•5h ago
Jumping out of the loop with a goto is also more readable than what Python has to offer. Refactoring things into functions just because you need to control the flow of the program is an anti pattern. Those functions add indirection and might never be reused. Why would you do that even if it was free performance wise?
This is why new low level languages offer alternatives to goto (defer, labelled break/continue, labelled switch/case) that cover most of the use cases.
Imo it's debatable if those are better and more readable than goto. Defer might be. Labelled break probably isn't although it doesn't matter that much.
Python meanwhile offers you adding more indirection, exceptions (wtf?) or flags (inefficient unrolling and additional noise instead of just goto ITEM_FOUND or something).
zimpenfish•21h ago
Alas, there's a lot of Go people who enjoy that kind of thing (flashback to when I was looking at an interface calling an interface calling an interface calling an interface through 8 files ... which ended up in basically "set this cipher key" and y'know, it could just have been at the top.)
mannykannot•19h ago
zimpenfish•13h ago
falcor84•20h ago
falcor84•20h ago
WillAdams•20h ago
https://github.com/johnousterhout/aposd-vs-clean-code
_A Philosophy of Software Design_ is an amazing and under-rated book:
https://www.goodreads.com/en/book/show/39996759-a-philosophy...
and one which I highly recommend and which markedly improved my code --- the other book made me question my boss's competence when it showed up on his desk, but then it was placed under a monitor as a riser which reflected his opinion of it....
onionisafruit•20h ago
hyperman1•19h ago
At the very least, you convinced me to add John's book to my ever-growing reading list.
jcranmer•16h ago
lucketone•19h ago
It was useful for me and many others, though I never took such (any?) advice literally (even if the author meant it)
Based on other books, discussions, advice and experience, I choose to remember (tell colleagues) it as “long(e.g. multipage) functions are bad”.
I assume CS graduates know better now, because it became common knowledge in the field.
ekjhgkejhgk•19h ago
I said this at a company I worked and got made fun of because "it's so much more organized". My take away is that the average person has zero ability to think critically.
frozenlettuce•16h ago
TZubiri•18h ago
But of course understandeable code with comments simply has much more bandwidth of expression so it will get the best of both worlds.
I see writing commentless code like practicing playing piano only with your left hand, it's a showoff and you can get fascinatingly close to the original piece (See Godowsky's Chopin adaptations for the left hand), but of course when you are done showing off, you will play with both hands.
jffhn•17h ago
As John Carmack said: "if a lot of operations are supposed to happen in a sequential fashion, their code should follow sequentially" (https://cbarrete.com/carmack.html).
A single method with a few lines is easy to read, like the processor reading a single cache line, while having to jump around between methods is distracting and slow, like the processor having to read various RAM locations.
Depending on the language you can also have very good reasons to have many lines, for example in Java a method can't return multiple primitive values, so if you want to stick to primitives for performances you inline it and use curly braces to limit the scope of its internals.
zahlman•10h ago
But I definitely don't go about it the same way. Mr. Martin honestly just doesn't seem very good at implementing his ideas and getting the benefits that he anticipates from them. I think the core of this is that he doesn't appreciate how complex it is to create a class, at all, in the first place. (Especially when it doesn't model anything coherent or intuitive. But as Jeffries' Sudoku experience shows, also when it mistakenly models an object from the problem domain that is not especially relevant to the solution domain.)
The bit about parameters is also nonsense; pulling state from an implicit this-object is clearly worse than having it explicitly passed in, and is only pretending to have reduced dependencies. Similarly, in terms of cleanliness, mutating the this-object's state is worse than mutating a parameter, which of course is worse than returning a value. It's the sort of thing that you do as a concession to optimization, in languages (like, not Haskell family) where you pay a steep cost for repeatedly creating similar objects that have a lot of state information in common but can't actually share it.
As for single-line functions, I've found that usually it's better to inline them on a separate line, and name the result. The name for that value is about as... valuable as a function name would be. But there are always exceptions, I feel.
Rapzid•9h ago
Things were pretty dire until Sandi Metz introduced Ruby developers to the rest of the programming world with "Practical Object-Oriented Design". I think that helped start a movement away from "clever", "artisanal", and "elegant" and towards more practicality that favors the future programmer.
Does anyone remember debugging Ruby code where lines in stack traces don't exist because the code was dynamically generated at run time to reduce boilerplate? Pepperidge Farm remembers.