By pushing ifs up, you often end up centralizing control flow in a single function, which has a complex branching logic, but all the actual work is delegated to straight line subroutines.
⁰ https://docs.sonarsource.com/sonarqube-server/latest/user-gu...
It can be case-dependent. Are you reasonably sure that the condition will only ever effect stuff inside the loop? Then sure, go ahead and put it there. If it's not hard to imagine requirements that would also branch outside of the loop, then it may be better to preemptively design it that way. The code may be more verbose, but frequently easier to follow, and hopefully less likely to turn into spaghetti later on.
(This is why I quit writing Haskell; it tends to make you feel like you want to write the most concise, "locally optimum" logic. But that doesn't express the intent behind the logic so much as the logic itself, and can lead to horrible unrolling of stuff when minor requirements change. At least, that was my experience.)
if (weShouldDoThis()) {
doThis();
}
It complements or is part of functional core imperative shell. All those checks being separate makes them easy to test, and if you care about complexity you can break out a function per clause in the check.Yeah only probably, there can sure be large distinct sub-tasks that aren't used by any other function yet would improve understanding to encapsulate and replace with a single function call. You decide which by asking which way makes the overall ultimate intent clearer.
Which way is a closer match to the email where the boss or customer described the business logic they wanted? Did they say "make it look at the 3rd word to see if it has traling spaces..."?
Or to find out which side of the fuzzy line a given situation is falling, just make them answer a question like, what is the purpose of this function? Is it to do the thing it's named after? Or is it to do some meaningless microscopic string manipulation or single math op? Why in the world do you want to give a name and identity to a single if() or memcpy() etc?
Well, now I have an answer...
So yeah, I agree, pulling conditions up can often be better for long-term maintenance, even if initially it seems like it creates redundancy.
The more things you can prove are invariant, the easier it is to reason about a piece of code, and doing the hoisting in the code itself rather than expecting the compiler to do it will make future human analysis easier when it needs to be updated or debugged.
You've really got to have certain contexts before thinking you ought to be pushing ifs up.
I mean generally, you should consider pushing an if up. But you should also consider pushing it down, and leaving it where it is. That is, you're thinking about whether you have a good structure for your code as you write it... aka programming.
I suppose you might say, push common/general/high-level things up, and push implementation details and low-level details down. It seems almost too obvious to say, but I guess it doesn't hurt to back up a little once in a while and think more broadly about your general approach. I guess the author is feeling that ifs are usually about a higher-level concern and loops about a lower-level concern? Maybe that's true? I just don't think it matters, though, because why wouldn't you think about any given if in terms of whether it specifically ought to move up or down?
I use `if`s a markers for special/edge cases and typically return in the last statement in the `if` block.
If I have an `else` block and it's large, then it's a clear indicator that it's actually two methods dressed as one.
This is not to say we shouldn't be having conversations about good practices, but it's really important to also understand and talk about the context that makes them good. Those who have read The Innovator's Solution would be familiar with a parallel concept. The author introduces the topic by suggesting that humanity achieved powered flight not by blindly replicating the wing of the bird (and we know how many such attempts failed because it tried to apply a good idea to the wrong context) but by understanding the underlying principle and how it manifests within a given context.
The recommendations in the article smell a bit of premature optimisation if applied universally, though I can think of context in which they can be excellent advice. In other contexts it can add a lot of redundancy and be error prone when refactoring, all for little gain.
Fundamentally, clear programming is often about abstracting code into "human brain sized" pieces. What I mean by that is that it's worth understanding how the brain is optimised, how it sees the world. For example, human short term memory can hold about 7±2 objects at once so write code that takes advantage of that, maintaining a balance without going to extremes. Holy wars, for example, about whether OO or functional style is always better often miss the point that everything can have its placed depending on the constraints.
Performance of an if-statement and for-loop are negligent. That's not the bottleneck of your app. If you're building something that needs to be highly performant, sure. But that's not the majority.
And the last example looks like a poor advice and contradicts previous advice: there's rarely a global condition that is enough to check once at the top: the condition usually is inside the walrus. And why do for walrus in pack {walrus.throbnicate()} instead of making throbnicate a function accepting the whole pack?
printInvoice(invoice, options) // is much better than
if(printerReady){
if(printerHasInk){
if(printerHasPaper){
if(invoiceFormatIsPortrait){
:
The same can be said of loops printInvoices(invoices) // much better than
for(invoice of invoices){
printInvoice(invoice)
}
At the end, while code readability is extremely important, encapsulation is much more important, so mix both accordingly.The function printInvoice should print an invoice. What happens if an invoice cannot be printed due to one of the named conditionals being false? You might throw an exception, or return a sentinel or error type. What do to in that case is not immediately clear.
Especially in languages where exceptions are somewhat frowned upon for general purpose code flow, and monadic errors are not common (say Java or C++), it might be a better option to structure the code similar to the second style. (Except for the portrait format of course, which should be handled by the invoice printer unless it represents some error.)
> while code readability is extremely important, encapsulation is much more important
Encapsulation seems to primarily be a tool for long-term code readability, the ability to refactor and change code locally, and to reason about global behavior by only concerning oneself with local objects. To compare the two metrics and consider one more important appears to me as a form of category error.
It's really about finding the entry points into your program from the outside (including data you fetch from another service), and then massaging in such a way that you make as many guarantees as possible (preferably encoded into your types) by the time it reaches any core logic, especially the resource heavy parts.
[0] Source: Abraham Lincoln. (At least I have heard he said that.)
EDIT: I am, of course, speaking specifically to the bravery and audacity of programming language designers, with admirable tunnel vision, bullheadedness and mission for maximally general algebraic and arbitrary constraint type systems.
I am on such a quest, and any capitulation to the inevitability of imperfect solutions, however probable of an outcome that might seem, is still a long way off.
https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
If instead of validating that someone has sent you a phone number in one spot and then passing along a string, you can as easily have a function construct an UncheckedPhoneNumber. You can choose to only construct VerifiedPhoneNumbers if the user has gone through a code check. Both would allow you to pluck a PhoneNumber out of them for where you need to have generic calling code.
You can use this sort of pattern to encode anything into the type system. Takes a little more upfront typing than all of those being strings, but your program will be sure of what it actually has at every point. It's pretty nice.
You don’t need to know all of the call chains because you’ve established a “narrow waist” where ideally all things have been made clear, and errors have been handled or scoped. So you only need to know the call chain from entry point to narrow waist, and separately narrow waist till end.
If you find a bug, you find it because you discover that a given input does not lead to the expected output.
You have to find all those ifs in your code because one of them is wrong (probably in combination with a couple of others).
If you push all your conditionals up as close to the input as possible, your hunt will be shorter, and fixing will be easier.
fn frobnicate(walrus: Option<Walrus>)`)
but the rest makes no sense to me! // GOOD
frobnicate_batch(walruses)
// BAD
for walrus in walruses {
frobnicate(walrus)
}
It doesn't follow through with the "GOOD" example though... fn frobnicate_batch(walruses)
for walrus in walruses { frobnicate(walrus) }
}
What did that achieve?And the next example...
// GOOD
if condition {
for walrus in walruses { walrus.frobnicate() }
} else {
for walrus in walruses { walrus.transmogrify() }
}
// BAD
for walrus in walruses {
if condition { walrus.frobnicate() }
else { walrus.transmogrify() }
}
What good is that when... walruses = get_5_closest_walruses()
// "GOOD"
if walruses.has_hungry() { feed_them_all() }
else { dont_feed_any() }
// "BAD"
for walrus in walruses {
if walrus.is_hungry() { feed() }
else { dont_feed() }
Vectorization is a bit obscure and a lot of coders aren't worried about whether their code vectorizes, but there's a much more common example that I have seen shred the performance of a lot of real-world code bases and HTTP APIs, which is functions (including APIs) that take only a single thing when they should take the full list.
Suppose we have posts in a database, like for a forum or something. Consider the difference between:
posts = {}
for id in postIDs:
post[id] = fetchPost(id)
versus posts = fetchPosts(postIDs)
fetchPost and fetchPosts both involve hitting the database. The singular version means that the resulting SQL will, by necessity, only have the one ID in it, and as a result, a full query will be made per post. This is a problem because it's pretty likely here that fetching a post is a very fast (indexed) operation, so the per-query overhead is going to hit you hard.The plural "fetchPosts", on the other hand, has all the information necessary to query the DB in one shot for all the posts, which is going to be much faster. An architecture based on fetching one post at a time is intrinsically less performant in this case.
This opens up even more in the HTTP API world, where a single query is generally of even higher overhead than a DB query. I think the most frequent mistake I see in HTTP API design (at least, ignoring quibbling about which method and error code scheme to use) is providing APIs that operate on one thing at a time when the problem domain naturally lends itself to operating on arrays (or map/objects/dicts) at a time. It's probably a non-trivial part of the reason why so many web sites and apps are so much slower than they need to be.
I find it is often easy to surprise other devs with how fast your system works. This is one of my "secrets" (please steal it!); you make sure you avoid as many "per-thing" penalties as possible by keeping sets of things together as long as possible. The "per-thing" penalties can really sneak up on you. Like nested for loops, they can easily start stacking up on you if you're not careful, as the inability to fetch all the posts at once further cascades in to you then, say, fetching user avatars one-by-one in some other loop, and then a series of other individual queries. Best part is, profiling may make it look like the problem is the DB because "the DB is taking a long time to serve this" because profiles are not always that good at turning up that your problem is per-item overhead rather than the amount of real work being done.
Our cloud provider had an aircon/overheating incident in the region we were using, and after it was resolved network latency between the database and application increased by a few milliseconds. Turns out if you multiply that by a few million/fast arrival rate you get a significant amount of time, and the pending tasks queue backs up causing the high priority tasks to be delayed.
Based on the traces we had it looked like a classic case of "ORM made it easy to do it this way, and it works fine until it doesn't" but was unfortunately out of our control being a third party product.
If they'd fetched/processed batches of tasks from the database instead I'm confident it wouldn't have been an issue.
An interface where the implementation can later be changed to do something more clever.
At work we have a lot of legacy code written the BAD way, ie the caller loops, which means we have to change dozens of call sites if we want to improve performance, rather than just one implementation.
This makes it significantly more difficult than it could have been.
Firstly, in many cases the function needs to serve both purposes — called on a single item or called on a sequence of such. A function that always loops would have to be called on some unitary sequence or iterator which is both unergonomic and might have performance implications.
Second, the caller might have more information than the callee on how to optimize the loop. Consider a function that might be computationally expensive for some inputs while negligible for others — the caller, knowing this information, could choose to parallelize the former inputs while vectorizing etc. the latter (via use of inlining, etc.). This would be very hard or at least complicate things when the callee's responsibility.
This is a trade-off: It can be beneficial to see the individual cases to be considered at the points where the actions are triggered, at the cost of having an additional code-level dependency on the list of individual cases.
If a certain function has many preconditions it needs to check, before running, but needs to potentially run from various places in the code, then moving the precondition checks outside the method results in faster code but destroys readability and breaks DRY principle.
In cases where this kind of tension (DRY v.s. non-DRY) exists I've sometimes named methods like 'maybeDoThing' (emphasis on 'maybe' prefix) indicating I'm calling the method, but that all the precondition checks are inside the function itself rather than duplicate logic all across the code, everywhere the method 'maybe' needs to run.
Like for example, if you want to make an idempotent operation, you might first check if the thing has been done already and if not, then do it.
If you push that conditional out to the caller, now every caller of your function has to individually make sure they call it in the right way to get a guarantee of idempotency and you can't abstract that guarantee for them. How do you deal with that kind of thing when applying this philosophy?
Another example might be if you want to execute a sequence of checks before doing an operation within a database transaction. How do you apply this philosophy while keeping the checks within the transaction boundary?
> If you push that conditional out to the caller, now every caller of your function has to individually make sure they call it in the right way to get a guarantee of idempotency
In this situation your function is no longer idempotent, so you obviously can’t provide the guarantee. But quite frankly, if you’re having to resort to making individual functions implement state management to provide idempotency, then I suspect you’re doing something very odd, and have way too much logic happening inside a single function.
Idempotent code tends to fall into two distinct camps:
1. Code that’s inherently idempotent because the data model and operations being performed are inherently idempotent. I.e. your either performing stateless operations, or you’re performing “PUT” style operations where in the input data contains all the state the needs to be written.
2. Code that’s performing more complex business operations where you’re creating an idempotent abstraction by either performing rollbacks, or providing some kind of atomic apply abstraction that ensures partial failures don’t result in corrupted state.
For point 1, you shouldn’t be checking for order of operations, because it doesn’t matter. Everything is inherently idempotent, just perform the operations again.
For point 2, there is no simple abstraction you can apply. You need to have something record the desired operation, then ensure it either completes or fails. And once that happens, ensures that completion or failure is persistent permanently. But that kind of logic is not the kind of thing you put into a function and compose with other operations.
Depends just how many things are checked by the check I guess. A single aspect, checking whether the resource is already claimed or is available, could be combined since it could be part of the very access mechanism itself where anything else is a race condition.
anywhere else, push ifs up.
Don’t meticulously evaluate and potentially prune every single branch, only to find you have to prune the whole limb anyways.
Or even weirder: conditionals are about figuring out what work doesn’t need to be done. Loops are the “work.”
Ultimately I want my functions to be about one thing: walking the program tree or doing work.
The premise that you can define best patterns like this, removed from context with toy words like frobnicate, is flawed. You should abstract your code in such a way that the operations contained are clearly intuited by the names and parameters of the abstraction boundaries. Managing cognitive load >>> nickle and dime-ing performance in most cases.
When I work with batches of data, I often end up with functions like this:
function process_batch(batch) {
stuff = setUpNeededHelpers(batch);
results = [];
for (item in batch) {
result = process_item(item, stuff);
results.add(result);
}
return results;
}
Where "stuff" might be various objects, such as counters, lists or dictionaries to track aggregated state, opened IO connections, etc etc.So the setUpNeededHelpers() section, while not extremely expensive, can have nontrivial cost.
I usually add a clause like
if (batch.length == 0) {
return [];
}
at the beginning of the function to avoid this initialization cost if the batch is empty anyway.Also, sometimes the initialization requires to access one element from the batch, e.g. to read metadata. Therefore the check also ensures there is at least one element available.
Wouldn't this violate the rule?
The article is offering a heuristic, not a hard rule (rule of thumb = heuristic, not dogma). It can't be applied universally without considering your circumstances.
Following his advice to the letter (and ignoring his hedging where he says "consider if"), you'd move the `if (batch.length == 0)` into the callers of `setUpNeededHelpers`. But now you have to make every caller aware that calling the function could be expensive even if there are no contents in `batch` so they have to include the guard, which means you have this scattered throughout your code:
if (batch.length == 0) { return default }
setup(batch)
Now it's a pair of things that always go together, which makes more sense to put into one function so you'd probably push it back down.The advice really is contingent on the surrounding context (non-exhaustive):
1. Is the function with the condition called in only one place? Consider moving it up.
2. Is the function with the condition called in many places and the condition can't be removed (it's not known to be called safely)? Leave the condition in the function.
3. Is the function with the condition called only in places where the guard is redundant? In your example, `batch.length == 0` can be checked in `process_batch`. If all calls to `setup` are in similar functions, you can remove the condition from `setup` and move it up.
4. If it's causing performance concerns (measured), and in many but not all cases the check is unneeded then remove the guard from `setup` and add it back to only those call-sites where it cannot be safely removed. If this doesn't get you any performance improvements, you probably want to move it back down for legibility.
Basically, apply your judgment. But if you can, it's probably (but not always) a good idea to move the ifs up.
Add asserts to the end of the function too.
Loop's can live in the middle, take as much I/O and compute out of the loop as you can :)
With conditionals, it's also useful to express them as ternary assignment when possible. This makes it more likely the optimizer will generate a conditional move instead of a branch. When the condition is not sufficiently predictable, a conditional move is far faster due to branch misprediction. Sometimes it's not always faster in the moment, but it can still alleviate pressure on the branch prediction cache.
This part in particular seems like an aesthetic judgment, and I disagree. I find it more natural to follow a flowchart than to stare at one.
> A related pattern here is what I call “dissolving enum” refactor.... There are two branching instructions here and, by pulling them up, it becomes apparent that it is the exact same condition, triplicated (the third time reified as a data structure):
The problem here isn't the code organization, but the premature abstraction. When you write the enum it should be because "reifying the condition as a data structure" is an intentional, purposeful act. Something that empowers you to, for example, evaluate the condition now and defer the response to the next event tick in a GUI.
> The primary benefit here is performance. Plenty of performance, in extreme cases.
Only if so many other things go right. Last I checked, simply wanting walruses to behave polymorphically already ruins your day, even if you've chosen a sufficiently low-level programming language.
A lot of the time, the "bad" code is the implementation of the function called in the "good" code. That makes said function easier to understand, by properly separating responsibilities (defining frobnication and iterating over walruses). Abstracting the inner loop to a function also makes it sane to express the iteration as a list comprehension without people complaining about how you have these nested list comprehensions spread over multiple lines, and why can't you just code imperatively like the normal programmers, etc.
> The two pieces of advice about fors and ifs even compose!
1. The abstraction needed to make the example comprehensible already ruins the illusion of `frobnicate_batch`.
2. If you're working in an environment where this can get you a meaningful performance benefit and `condition` is indeed a loop invariant (such that the transformation is correct), you are surely working in an environment where the compiler can just hoist that loop invariant.
3. The "good" version becomes longer and noisier because we must repeat the loop syntax.
> jQuery was quite successful back in the day, and it operates on collections of elements.
That's because of how it allowed you to create those collections (and provided iterators for them). It abstracted away the complex logic of iterating over the entire DOM tree to select nodes, so that you could focus on iterating linearly over the selected nodes. And that design implicitly, conceptually separated those steps. Even if it didn't actually build a separate container of the selected nodes, you could reason about what you were doing as if it did.
And this was obfuscated by author's use of global variables everywhere.
The key change was reducing functions' dependencies on outer parameters. Which is great.
[0] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
Feels a lot like "i before e except after c" where there's so many exceptions to the rule that it may as well not exist.
// Good? for walrus in walruses { walrus.frobnicate() }
Is essentially equivalent to
// BAD for walrus in walruses { frobnicate(walrus) }
And this is good,
// GOOD frobnicate_batch(walruses)
So should the first one really be something more like
// impl FrobicateAll for &[Walrus] walruses.frobicate_all()
esafak•4h ago
CraigJPerry•3h ago
That's not quite right, it's a substitution AND ablation of 2 functions and an enum from the code base.
There's quite a reduction in complexity he's advocating for.
Further, the enum and the additional boilerplate is not adding type safety in this example. Presumably the parameters to foo and bar are enforced in all cases so the only difference between the two examples is the additional boilerplate of a 2-armed enum.
I strongly suspect in this case (but i haven't godbolted it to be sure) that both examples compile to the same machine code. If my hunch is correct, then the remaining question is, does introduction of double-entry book keeping on the if condition add safety for future changes?
Maybe. But at what cost? This is one of those scenarios where you bank the easy win of reduced complexity.
josephg•3h ago
Whether or not this matters depends on what, exactly, is in those match arms. Sometimes there's some symmetry to the arms of an if statement. And in that case, being exhaustive is important. But there's plenty of times where I really just have a bit of bookkeeping to do, or an early return or something. And I only want to do it in certain cases. Eg if condition { break; } else { stuff(); }
Also, if-else is exhaustive already. Its still exhaustive even if you add more "else if" clauses, like if {} else if {} else {}.
Match makes sense when the arms of the conditional are more symmetrical. Or when you're dealing with an enum. Or when you want to avoid repeating conditions. (Eg match a.cmp(b) { Greater / Equal / Less } ).
The best way to structure your code in general really comes down to what you're trying to do. Sometimes if statements are cleaner. Sometimes match expressions. It just depends on the situation.
esafak•2h ago