For algorithms, a little memory outweighs a lot of time

https://www.quantamagazine.org/for-algorithms-a-little-memory-outweighs-a-lot-of-time-20250521/

343•makira•8mo ago

Comments

cperciva•8mo ago

Minus the fuzz: A multitape Turing machine running in time t can be simulated using O(sqrt(t log t)) space (and typically more than t time).

https://arxiv.org/abs/2502.17779

diamondage•8mo ago

Should have come to the comments first!

LPisGood•8mo ago

I think it is very intuitive that more space beats the pants off of more time.

In time O(n) you can use O(n) cells on a tape, but there are O(2^n) possible configurations of symbols on a tape of length n (for an alphabet with 2 symbols), so you can do so much more with n space than with n time.

thatguysaguy•8mo ago

Intuitive yes, but since P != PSPACE is still unproven it's clearly hard to demonstrate.

dragontamer•8mo ago

The article is about a new proof wherein P == PSPACE.

Something we all intuitively expected but someone finally figured out an obscure way to prove it.

--------

This is a really roundabout article that takes a meandering path to a total bombshell in the field of complexity theory. Sorry for spoiling but uhhh, you'd expect an article about P == PSPACE would get to the point faster....

LPisGood•8mo ago

This article is not about a proof that P = PSPACE. That would be way bigger news since it also directly implies P = NP.

LPisGood•8mo ago

I think that since many people find it intuitive that P != NP, and PSPACE sits way on top of polynomial hierarchy that it is intuitive even if it’s unproven.

porphyra•8mo ago

There's not even a proof that P != EXPTIME haha

EDIT: I am a dumbass and misremembered.

doc_manhat•8mo ago

I think there is right? It's been a long time but I seem to remember it following from the time hierarchy theorem

LPisGood•8mo ago

I thought there was some simple proof of this, but all I can think of is time hierarchy theorem.

qbane•8mo ago

But you also spend time on updating cells, so it is not that intuitive.

LPisGood•8mo ago

I’m not sure what you mean here. If you’re in the realm of “more space” than you’re not thinking of the time it takes.

More precisely, I think it is intuitive that the class of problems that can be solved in any time given O(n) space is far larger than the class of problems that can be solved in any space given O(n) time.

Almondsetat•8mo ago

If your program runs in O(n) time, it cannot use more than O(n) memory (upper bound on memory usage.

If your program uses O(n) memory, it must run at least in O(n) time (lower bound on time).

bmacho•8mo ago

In other words M <= T.

withinboredom•8mo ago

This is pretty easy to refute:

> If your program runs in O(n) time, it cannot use more than O(n) memory (upper bound on memory usage.[sic]

This is clearly refuted by all software running today. Programs (especially games) clearly use more memory than there are instructions in the program.

> If your program uses O(n) memory, it must run at least in O(n) time (lower bound on time).

Memory bombs use an incredible amount of memory and do it incredibly quickly.

Almondsetat•8mo ago

>Programs (especially games) clearly use more memory than there are instructions in the program.

How can you access a piece of memory without issuing an instruction to the CPU? Also, "clearly" is not an argument.

>Memory bombs use an incredible amount of memory and do it incredibly quickly.

How can you access a piece of memory without issuing an instruction to the CPU? Also "incredibly quickly" is not an argument. Also also, O(n) is incredibly quick.

withinboredom•8mo ago

> Also, "clearly" is not an argument.

As in your assertion is literally self-evidently false. It is on you to provide a burden of proof here; especially since there are instructions that can load more than a single bit of memory.

> How can you access a piece of memory without issuing an instruction to the CPU?

Let me rather ask you this: where do the instructions exist that are running? That is right: in memory. However, just because instructions exist in memory doesn’t mean they’re accessed. There is not a relationship between the number of instructions and the amount of memory accessed/used.

Almondsetat•8mo ago

This is about time and memory complexity, which is a formal field of computer science. Your replies are about your own vague understanding of computing, which is not the topic here.

withinboredom•8mo ago

Yes, but you are asserting the relationship is directly connected -- which is clearly not true. You said that it is O(n) memory and O(n) time, both using n. That means a program containing x bytes can only run for x seconds. This is clearly not true.

Almondsetat•8mo ago

>That means a program containing x bytes can only run for x seconds.

That is not what it means. Again, if you are not familiar with the notation then all you are doing is slapping your personal ideas about computing to some symbols

withinboredom•8mo ago

Hmm. Telling me I'm 'stupid' doesn't make you right.

delusional•8mo ago

This is obviously demonstrably true. A Turing running in O(n) time must halt. The one in O(n) space is free not to.

thaumasiotes•8mo ago

Almondsetat's proof seems more obvious. Given O(n) time, you can only use O(n) space, so you're comparing "O(n) space, any amount of time" with "O(n) space, O(n) time", and it turns out you get more resources the first way.

qbane•8mo ago

A time-bounded TM is also space bounded, because you need time to write to that many cells. But the other way is not.

hn_acc1•8mo ago

My intuition: the value of a cell can represent the result of multiple (many) time units used to compute something. If you cannot store enough intermediate results, you may end up needing to recalculate the same / similar results over and over - at least in some algorithms. So one cell can represent the results of hundreds of time units, and being able to store / load that one value to re-use it later can then replace those same hundreds of time units. In effect, space can be used for "time compression" (like a compressed file) when the time is used to compute similar values multiple times.

If intermediate results are entirely uncorrelated, with no overlap in the work at all, that would not hold - space will not help you. Edit: This kind of problem is very rare. Think of a cache with 0 percent hit rate - almost never happens.

And you can't really do it the other way around (at least not in current computing terms / concepts): you cannot use a single unit of time as a standin / proxy for hundreds of cells, since we don't quite have infinitely-broad SIMD architectures.

slt2021•8mo ago

I think this theorem applies well for modern LLMs: large language model with pre-computed weights can be used to compute very complex algorithms that approximate human knowledge, that otherwise were impossible or would have required many orders more compute to calculate

benchloftbrunch•8mo ago

As I understand it, this is the essential insight behind dynamic programming algorithms; the whole idea is to exploit the redundancies in a recursive task by memoizing the partial results of lower order stages.

RetroTechie•8mo ago

There's many algorithms with a space vs time tradeoff. But what works best, depends a lot on the time/energy cost of re-computing something, vs the storage/bandwidth cost of caching results.

Expensive calculation, cheap storage → caching results helps.

Limited bandwidth / 'expensive' storage, simple calculation (see: today's hyper-fast CPU+L1 cache combo's) → better to re-compute some things on the fly as needed.

I suspect there's a lot of existing software (components) out there designed along the "save CPU cycles, burn storage" path, where in modern reality a "save storage, CPU cycles are cheap" would be more effective. CPU speeds have grown way way faster than main memory bandwidth (or even size?) over the last decades.

For a datacenter, supercomputer, embedded system, PC or some end-user's phone, the metrics will be different. But same principle applies.

frollogaston•8mo ago

Also, the O(1) random memory access assumption makes it easy to take memory for granted. Really it's something like O(n^(1/3)) when you're scaling the computer to the size of the problem, and you can see this in practice in datacenters.

I forget the name of the O(1) access model. Not UMA, something else.

cperciva•8mo ago

O(n^(1/2)) really, since data centers are 2 dimensional, not 3 dimensional.

(Quite aside from the practical "we build on the surface of the earth" consideration, heat dissipation considerations limit you to a 2 dimensional circuit in 3-space.)

frollogaston•8mo ago

If you have rows of racks of machines, isn't that 3 dimensions? A machine can be on top of, behind, or next to another that it's directly connected to. And the components inside have their own non-uniform memory access.

Or if you're saying heat dissipation scales with surface area and is 2D, I don't know. Would think that water cooling makes it more about volume, but I'm not an expert on that.

jvanderbot•8mo ago

Spatial position has nothing (ok only a little) to do with topology of connections.

manwe150•8mo ago

That example would be two dimensions still in the limit computation, since you can keep building outwards (add buildings) but not scale upwards (add floors)

frollogaston•8mo ago

You can add floors though. Some datacenters are 8 stories with cross-floor network fabrics.

immibis•8mo ago

When you get to, say, 100000 stories, you can't build more stories. At this point your computer costs more than the Earth's GDP for a century, so talking about theoretical scaling laws is irrelevant. Eventually you run out of the sun's power output so you build a Dyson sphere and eventually use all of that power, anyway.

frollogaston•8mo ago

Oh right, so the height is practically a constant. Square root for sure then.

LPisGood•8mo ago

All algorithms are O(1) in this case

frollogaston•8mo ago

You pick what things are constant and what's variable. If you're scaling a supercomputer to fit a problem, the height is going to max out quickly and can be treated as constant, while the other dimensions are variable.

mpoteat•8mo ago

More fundamentally O(n^(1/2)) due to the holographic principle which states that the maximal amount of information encodable in a given region of space scales wrt its surface area, rather than its volume.

(Even more aside to your practical heat dissipation constraint)

frollogaston•8mo ago

Hmm, I'll go with that

vlowther•8mo ago

Just need to make sure all your computation is done in a volume with infinite surface area and zero volume. Encoding problem solved. Now then, how hyperbolic can we make the geometry of spacetime before things get too weird?

LegionMammal978•8mo ago

On the other hand, actual computers can work in parallel when you scale the hardware, something that the TM formulation doesn't cover. It can be interesting which algorithms work well with lots of computing power subject to data locality. (Brains being the classic example of this.)

LPisGood•8mo ago

Multitape TMs are pretty well studied

undefuser•8mo ago

I think it really depends on the task at hand, and not that intuitive. At some point accessing the memory might be slower than repeating the computation, especially when the storage is slow.

IsTom•8mo ago

One one hand yes, on the other there might be some problems that are inherently difficult to parallelize (alternating machine PTIME is the same as deterministic PSPACE) where space doesn't buy you much. The jump from paper from t/log t to sqrt(t log t) is huge, but it still might be that unbounded parallelism doesn't buy you much more.

whatever1•8mo ago

Lookup tables with precalculated things for the win!

In fact I don’t think we would need processors anymore if we were centrally storing all of the operations ever done in our processors.

Now fast retrieval is another problem for another thread.

chowells•8mo ago

Oh, that's not a problem. Just cache the retrieval lookups too.

michaelhoney•8mo ago

it's pointers all the way down

drob518•8mo ago

Just add one more level of indirection, I always say.

EGreg•8mo ago

But seriously… the solution is often to cache / shard to a halfway point — the LLM model weights for instance — and then store that to give you a nice approximation of the real problem space! That’s basically what many AI algorithms do, including MCTS and LLMs etc.

crmd•8mo ago

Reminds me of when I started working on storage systems as a young man and once suggested pre-computing every 4KB block once and just using pointers to the correct block as data is written, until someone pointed out that the number of unique 4KB blocks (2^32768) far exceeds the number of atoms in the universe.

ww520•8mo ago

The idea is not too far off. You could compute a hash on an existing data block. Store the hash and data block mapping. Now you can use the hash in anywhere that data block resides, i.e. any duplicate data blocks can use the same hash. That's how storage deduplication works in the nutshell.

valenterry•8mo ago

Except that there are collisions...

datameta•8mo ago

This might be completely naive but can a reversible time component be incorporated into distinguishing two hash calculations? Meaning when unpacked/extrapolated it is a unique signifier but when decomposed it folds back into the standard calculation - is this feasible?

ruined•8mo ago

hashes by definition are not reversible. you could store a timestamp together with a hash, and/or you could include a timestamp in the digested content, but the timestamp can’t be part of the hash.

RetroTechie•8mo ago

> hashes by definition are not reversible.

Sure they are. You could generate every possible input, compute hash & compare with a given one.

Ok it might take infinite amount of compute (time/energy). But that's just a technicality, right?

dagw•8mo ago

Sure they are. You could generate every possible input

Depends entirely on what you mean by reversible. For every hash value, there are an infinite number of inputs that give that value. So while it is certainly possible to find some input that hashes to a given value, you cannot know which input I originally hashed to get that that value.

datameta•8mo ago

Oh, of course, the timestamp could instead be metadata!

shakna•8mo ago

Some hashes do have verification bits, that are used not just to verify intact hash, but one "identical" hash from another. However, they do tend to be slower hashes.

grumbelbart2•8mo ago

Do you have an example? That just sounds like a hash that is a few bits longer.

shakna•8mo ago

Mostly use of GCM (Galois/Counter Mode). Usually you tag the key, but you can also tag the value to check verification of collisions instead.

But as I said, slow.

ww520•8mo ago

Can use cryptographic hashing.

anonymars•8mo ago

How does that get around the pigeonhole principle?

I think you'd have to compare the data value before purging, and you can only do the deduplication (purge) if the block is actually the same, otherwise you have to keep the block (you can't replace it with the hash because the hash link in the pool points to different data)

ww520•8mo ago

The hash collision chance is extremely low.

valenterry•8mo ago

For small amounts of data yeah. With growing data, the chance of a collision grows more than proportional. So in the context of working on storage systems (like s3 or so) that won't work unless customers actually accept the risk of a collission as okay. So for example, when storing media data (movies, photos), I could imagine that, but not for data in general.

ww520•8mo ago

Cryptographic hashing collisions are very very small, like end of universe in numerous times small. They're smaller than AWS being burnt down and all backups were lost leading to data loss.

valenterry•8mo ago

You have a point.

When using MD5 (128bit) then when AWS S3 would apply this technique, it would only get a handful of collisions. Using 256bit would drive that down to a level where any collision is very unlikely.

This would be worth it if a 4kb block is, on average, duplicated with a chance of at least 6.25%. (not considering overhead of data-structures etc.)

makmanalp•8mo ago

In some contexts, dictionary encoding (which is what you're suggesting, approximately) can actually work great. For example common values or null values (which is a common type of common value). It's just less efficient to try to do it with /every/ block. You have to make it "worth it", which is a factor of the frequency of occurrence of the value. Shorter values give you a worse compression ratio on one hand, but on the other hand it's often likelier that you'll find it in the data so it makes up for it, to a point.

There are other similar lightweight encoding schemes like RLE and delta and frame of reference encoding which all are good for different data distributions.

manwe150•8mo ago

It seems like you weren’t really that far off from implementing it, you just need a 4 KB pointer to point to the right block. And in fact, that is what all storage systems do!

jodrellblank•8mo ago

Reminds me of when I imagined brute-forcing every possible small picture as simply 256 shades of gray for each pixel x (640 x 480 = 307200 pixels) = 78 million possible pictures.

Actually I don't have any intuition for why that's wrong, except that if we catenate the rows into one long row then the picture can be considered as a number 307200 digits long in base 256, and then I see that it could represent 256^307200 possible different values. Which is a lot: https://www.wolframalpha.com/input?i=256%5E307200

p1necone•8mo ago

78 million is how many pixels would be in 256 different pictures with 307200 pixels each. You're only counting each pixel once for each possible value, but you actually need to count each possible value on each pixel once per possible combinations of all of the other pixels.

The number of possible pictures is indeed 256^307200, which is an unfathomably larger number than 78 million. (256 possible values for the first pixel * 256 possible values for the second pixel * 256 possi...).

deadfoxygrandpa•8mo ago

i think at some point you should have realized that there are obviously more than 78 million possible greyscale 640x480 pictures. theres a lot of intuitive examples but just think of this:

https://images.lsnglobal.com/ZFSJiK61WTql9okXV1N5XyGtCEc=/fi...

if there were only 78 million possible pictures, how could that portrait be so recongizably one specific person? wouldnt that mean that your entire picture space wouldnt even be able to fit a single portrait of everyone in Germany?

jodrellblank•8mo ago

"At some point" I do realise it. What I don't have is an intuitive feel for why a number can be three digits 000 to 999 and each place has ten choices, but it's not 10 x 3 possibles. I tried to ask ChatGPT to give me an intuition for it, but all it does is go into an explanation of combinations. I know it's 10 x 10 x 10 meaning 10^3 I don't need that explanation again, what I'm looking for is an intuition for why it isn't 10x3.

> "if there were only 78 million possible pictures, how could that portrait be so recongizably one specific person? wouldnt that mean that your entire picture space wouldnt even be able to fit a single portrait of everyone in Germany?"

It's not intuitive that "a 640x480 computer picture must be able to fit a single portrait of everyone in Germany"; A human couldn't check it, a human couldn't remember 78 million distinct pictures, look through them, and see that they all look sufficiently distinct and at no point is it representing 50k people with one picture; human attention and memory isn't enough for that.

ThomasBHickey•8mo ago

Try starting with a 2x2, then 3x3, etc. image and manually list all the possibilities.

jodrellblank•8mo ago

That's focusing on the wrong thing; as I said, "I know it's 10 x 10 x 10 meaning 10^3 I don't need that explanation [for the correct combinations], what I'm looking for is an intuition for why it isn't 10x3".

cwmoore•8mo ago

ChatGPT might be able to explain combinatorics if you use the keyterm.

I’m fond of derangements and their relationship with permutations, which contain a factor of e.

plastic3169•8mo ago

I had friend who had the same idea to do it for pixel fonts with only two colors and 16x16 canvas. It was still 2^256. Watching that thing run and trying to estimate when it would finish made me understand encryption.

danwills•8mo ago

Yeah I had a similar thought back in the 90s and made a program to iterate through all possible images at a fairly low res, I left it running while I was at school and got home after many hours to find it had hardly got past the first row of pixels! This was a huge eye-opener about how big a possibility-space digital images really exist in!

mystified5016•8mo ago

I has the same idea when I first learned about programming as a teenager. I wonder how many young programmers have had this exact same train of thought?

nine_k•8mo ago

If some blocks are highly repetitive, this may make sense.

It's basically how deduplication works in ZFS. And that's why it only makes sense when you store a lot of repetitive data, e.g. VM images.

whatever1•8mo ago

We know for a fact that when we disable the cache of the processors their performance plummets, so the question is how much of computation is brand new computation (never seen before)?

vlovich123•8mo ago

While true, a small technical nitpick is that the cache also contains data that’s previously been loaded and reused, not just as a result of a previous computation (eg your executable program itself or a file being processed are examples)

Traubenfuchs•8mo ago

you might be interested in pifs

https://github.com/philipl/pifs

benchloftbrunch•8mo ago

The other problem is that (if we take literally the absurd proposal of computing "every possible block" up front) you're not actually saving any space by doing this, since your "pointers" would be the same size as the blocks they point to.

lesuorac•8mo ago

If you don't do _actually_ every single block then you have Huffman Coding [1].

I imagine if you have a good idea of the data incoming you could probably do a similar encoding scheme where you use 7 bits to point to a ~512 bit blob and the 8th bit means the next 512 couldn't be compressed.

[1]: https://en.wikipedia.org/wiki/Huffman_coding

Nevermark•8mo ago

The other problem is to address all possible 4098 byte blocks, you need a 4098 byte address. I suppose we would expect the actual number of blocks computed and reused to be a sparse subset.

Alternately, have you considered 8 byte blocks?

If your block pointers are 8-byte addresses, you don't need to count on block sparsity, in fact, you don't even need to have the actual blocks.

A pointer type, that implements self-read and writes, with null allocations and deletes, is easy to implement incredibly efficiently in any decent type system. A true zero-cost abstraction, if I have ever seen one!

(On a more serious note, a memory heap and CPU that cooperated to interpret pointers with the top bit set, as a 63-bit linear-access/write self-storage "pointer", is an interesting thought.

mncharity•8mo ago

> if we were centrally storing all of the operations

Community-scale caching? That's basically what pre-compiled software distributions are. And one idea for addressing the programming language design balk "that would be a nice feature, but it's not known how to compile it efficiently, so you can't have it", is highly-parallel cloud compilation, paired with a community-scale compiler cache. You might not mind if something takes say a day to resolve, if the community only needs it run once per release.

20after4•8mo ago

Community scale cache, sounds like a library (the bricks and mortar kind)

EGreg•8mo ago

You’re not wrong

Using an LLM and caching eg FAQs can save a lot of token credits

AI is basically solving a search problem and the models are just approximations of the data - like linear regression or fourier transforms.

The training is basically your precalculation. The key is that it precalculates a model with billions of parameters, not overfitting with an exact random set of answers hehe

walterbell•8mo ago

> Using an LLM and caching eg FAQs can save a lot of token credits

Do LLM providers use caches for FAQs, without changing the number of tokens billed to customer?

EGreg•8mo ago

No, why would they. You are supposed to maintain that cache.

What I really want to know is about caching the large prefixes for prompts. Do they let you manage this somehow? What about llama and deepseek?

jsnider3•8mo ago

> In fact I don’t think we would need processors anymore if we were centrally storing all of the operations ever done in our processors.

On my way to memoize your search history.

handsclean•8mo ago

https://conwaylife.com/wiki/HashLife is an algorithm for doing basically this in Conway’s Game of Life, which is Turing complete. I remember my first impression being complete confusion: here’s a tick-by-tick simulation too varied and complex to encapsulate in a formula, and you’re telling me I can just skip way into its future?

RetroTechie•8mo ago

If I read that page correctly, it does this for areas with empty space between them?

Makes sense. Say you have a pattern (surrounded by empty space) that 'flickers': A-B-A-B-A... etc. Then as long as nothing intrudes, nth generation is the same pattern as in n+1000,000th generation. Similar for patterns that do a 3-cycle, 4-cycle etc.

All you'd need is a) a way to detect repeating patterns, and b) do some kind of collision detection between areas/patterns (there's a thing called 'lightspeed' in Life, that helps).

handsclean•8mo ago

I don’t fully understand the algorithm, but no, to my understanding it’s much more general than that. In each tick a cell’s state is solely determined by its immediate neighbors, which means the simulation has a “speed of light” of 1 cell/second: to look N ticks into the future, you need only consider cells within N cells of the area you’re computing, no matter what’s outside that. So, for example, if you want to skip a 10x10 area 100 ticks into the future, you consider a 210x210 area centered on your 10x10, compute it once, then in the future use that 210x210 area as a lookup key for the 10x10 100 ticks into the future. I think HashLife is also somehow doing this on multiple scales at once, and some other tricks.

IvanK_net•8mo ago

I am confused. If a single-tape turing machine receives a digit N in binary, and is supposed to write N ones on the tape, on the right side of the digit N, it performs N steps.

If you expect N ones at the output, how can this machine be simulated in the space smaller than N?

This machine must decrement the digit N at the beginning of the tape, and move to the end of the tape to write "1", so it runs in time O(N^2), not O(N)? (as it takes N "trips" to the end of the tape, and each "trip" takes 1, 2, 3 .. N steps)

Since turing machines can not jump to any place on a tape in constant time (like computers can), does it have any impact on real computers?

cperciva•8mo ago

Multitape Turing machines are far more powerful (in terms of how fast they can run, not computability) than single-tape machines.

But to answer your question: "space" here refers to working space, excluding the input and output.

IvanK_net•8mo ago

A single tape machine is still a multi tape machine, only with one tape.

iNic•8mo ago

This paper looks exclusively at decision problems, i.e. problems where the output is a single bit.

EDIT: This makes sense because if you look at all problems with N outputs then that is just the same as "gluing together" N different decision problems (+ some epsilon of overhead)

IvanK_net•8mo ago

Oh okay, that was my second guess.

willmarquis•8mo ago

This is just a reminder that memory isn’t just a constraint, it’s a resource.

michaelsbradley•8mo ago

What resources are available (or not) and in what quantities are the most basic constraints for solving a problem/s with a computer.

felineflock•8mo ago

Ryan Williams lecture (how he started): https://www.youtube.com/live/0DrFB2Cp7tg

And paper: https://people.csail.mit.edu/rrw/time-vs-space.pdf

vishnugupta•8mo ago

Rainbow Tables FTW!

golly_ned•8mo ago

“ If the problem is solved next week, Williams will be kicking himself. Before he wrote the paper, he spent months trying and failing to extend his result”

What a strange, sad way to think about this. Academia is perverse.

catoc•8mo ago

Nothing necessarily perverse here. I don’t know Williams but don’t image him disliking the other guy or being unhappy with progress in general, but just being someone who truly challenged himself only to find him being trumped a week later; and kicking himself for that.

Either way, the week is not yet over, at least since the quanta article, so maybe no kicking will ensue.

golly_ned•8mo ago

I wonder how the researchers who supplied the tree evaluation algorithm felt when Williams supplied his proof. Dismay at having not kept their result under wraps for longer so they could claim credit for such an advance themselves? I hope not.

ziofill•8mo ago

At the cost of sounding ridiculous: can there be a notion of "speed of light" in the theory of computation, determining the ultimate limit of memory (space) vs runtime?

awanderingmind•8mo ago

You mean something like this https://en.wikipedia.org/wiki/Bremermann%27s_limit or this https://en.wikipedia.org/wiki/Quantum_speed_limit?

ziofill•8mo ago

Oh wait, I just realized what I said was probably very stupid: I was thinking of some computational complexity theorem that links memory and runtime complexity classes in the same way that the "speed of light" sets an ultimate bound on the relation between actual space and actual time.

But the speed of light is the maximum space in the smallest time, which computationally would correspond to filling the largest amount of memory in the shortest time :facepalm: (and thanks for the links!)

xlii•8mo ago

From the „Camel Book”, one of my favorite programming books (not because it was enlightening, but because it was entertaining); on the Perl philosophy:

“If you’re running out of memory, you can buy more. But if you’re running out of time, you’re screwed.”

immibis•8mo ago

This can work both ways. If the program needs more memory than the computer has, it can't run until you buy more. But if it takes twice as long, at least it runs at all.

Dban1•8mo ago

brain brain brain

hinkley•8mo ago

The Camel book was written when Moore’s Law was trucking along. These days you can’t buy much more time but you used to be able to just fine. Now it’s horizontal scaling. Which is still more time.

const_cast•8mo ago

Modern computers have so much memory it feels like it doesn't matter. Spending that memory on arrays for algorithms or things like a Garbage Collector just make sense. And, extra memory is worthless. You WANT the summation of all your programs to use all your memory. The processor, on the other hand, can context switch and do everything in it's power to make sure it stays busy.

The CPU is like an engine and memory is your gas tank. Idling the engine is bad, but leaving gas in the tank doesn't hurt, but it doesn't help either. I'm not gonna get to my destination faster because I have a full tank.

Mawr•8mo ago

Only if running one such memory-hungry program at a time, which usually cannot be afforded. Multi-program workloads are much more common and the strategy of using as much ram as possible can't work in that environment.

woolion•8mo ago

It's unfortunate that Quanta links are so popular, when they include so much pseudo-poetic fluff around the mathematics. Below there's an entire thread to dismiss a misconception introduced by the quanta article.

"I think it is very intuitive that more space beats the pants off of more time." (poster is absolutely right) The The article say "Until now, the only known algorithms for accomplishing certain tasks required an amount of space roughly proportional to their runtime, and researchers had long assumed there’s no way to do better.", which is interpreted as that there's a proportional relation between time and space. However, a quick look at the complexity hierarchy would never suggest such a thing. Reading more carefully, it says "known algorithms" for "certain tasks", but then where do you get a general intuition from such a particular statement?

godelski•8mo ago

I think they used to be better but really have made a blatant turn. I really thought that wormhole fiasco would have killed them. To go 4 whole months before putting the editor's note is beyond egregious[0]. Mistakes happen, but 4 months kills all credibility. You have to act fast on those things! There were big names raising serious concerns on day 1 and it really shows they didn't do due diligence to get outside verification before running a piece that they knew would be really popular.

All this accomplishes is discrediting science. Trading personal gains for eroding the very thing that they make their money off of. This is a major part of why Americans (and people) have such high distrust for science. News outlets, and in particular science focused news outlets, constantly spew inaccurate information. It really should be no wonder that so many people are confused about so many scientific topics, as unless they actually take the years it takes to become an expert in a field, they are going to have a difficult time distinguishing fact from fiction. And why shouldn't the average person expect to trust a source like Quanta? They're "full of experts", right? smh

[0] This is the earliest archive I see with the note. Press back one day and it should not be there. Article was published on Nov 30 2022, along with a youtube video https://web.archive.org/web/20230329191417/https://www.quant...

bbrubaker•8mo ago

I'm the author of this article. If you ask a complexity theorist, they will tell you that they did in fact have a general intuition that certain problems require space close to to linear in time to solve (see e.g., Ryan's comment #22 on Scott Aaronson's blog post about the result: https://scottaaronson.blog/?p=8680, and the comments after that). The most intuitive way to see this is in a circuit/DAG picture, where the goal is to get from the input nodes of the graph to the output nodes. Some graphs are very "wide": cut the graph at some intermediate point, and there will be a lot of edges crossing the cut, each of which represents some information from an earlier stage in the computation that you'll need to remember to get to the output. Ryan's first result is a general-purpose method for doing any computation, even ones whose graph structure looks like this, in asymptotically far less space. That is precisely what makes the result so surprising!

My article was quite explicit in multiple places that the universal/comprehensive character of the result was that counterintuitive part:

- In the first paragraph: "memory was more powerful than computer scientists believed: A small amount would be as helpful as a lot of time in all conceivable computations."

- Further down in the introduction, in the passage you quoted: "Until now, the only known algorithms for accomplishing certain tasks required an amount of space roughly proportional to their runtime, and researchers had long assumed there’s no way to do better. Williams’ proof established a mathematical procedure for transforming any algorithm — no matter what it does — into a form that uses much less space.

- In the third section, I explicitly state that researchers do believe space is more powerful than time in the specific sense that you're criticizing my article for misrepresenting: "But complexity theorists suspect that PSPACE is a much larger class, containing many problems that aren’t in P. In other words, they believe that space is a far more powerful computational resource than time. This belief stems from the fact that algorithms can use the same small chunk of memory over and over, while time isn’t as forgiving — once it passes, you can’t get it back."

- In the fourth section, I explain why researchers didn't think the HPV75 result could be improved further, despite their intuition that space is more powerful than time in the above sense: "While many problems can be solved with much less space than time, some intuitively seemed like they’d need nearly as much space as time."

TCS (and complexity theory specifically) are complicated subjects. I spend a lot of time interviewing researchers and thinking about how to distill the results of my reporting into a form that is accessible to readers with widely varying levels of familiarity with the subject matter. You are of course well within your rights to critique my stylistic choices, the narrative aspects of the story, and the order in which I presented information, but I will push back against the claim that my article is spreading misinformation about complexity theory. You're referring to a misconception that arises, by your own admission, when you don't read carefully. If it's the headline you object to, you could lodge a similar complaint against the complexity theorist Lance Fortnow: https://blog.computationalcomplexity.org/2025/02/you-need-mu....

marojejian•8mo ago

Thanks for the detailed response to that comment. I just read and enjoyed your article, and your response seems accurate.

I can imagine it's tough to put that much effort into communicating something complex to a wide audience, only to have a bunch of very smart people here attempt to tear it apart.

bbrubaker•8mo ago

Thanks for the kind words! I do get lots of gratifying positive feedback as well. I don't make a habit of arguing with strangers online, but I felt obliged to correct the record here for anyone who might encounter this later and come away thinking "wow, I can't believe Quanta got it completely backward."

Here on HN people often complain about the level of detail, which is fair! I think they are often falling prey to a common fallacy about conditional probabilities. P("X reads Quanta"|"X has some technical background (college STEM major or more)") is likely larger than it is for most popular science magazine. But P("Y has some technical background"|"Y reads Quanta") is much lower than many people realize. There is a limit on how much technical stuff I can put in an article and still have it be accessible to many of our readers, and I care a lot about making things accessible.

ChrisMarshallNY•8mo ago

Interesting. It's one of those things that I’ve always “just assumed,” without thinking about it.

I did a lot of raster graphics programming, in my career, and graphics work makes heavy use of lookup tables.

Yesterday, I posted a rather simple tool I wrote[0]: a server that “frontloads” a set of polygons into a database, and then uses them, at query time. It’s fairly fast (but I’m sure it could be a lot faster). I wrote it in a few hours, and got pretty good performance, right out of the starting gate.

Pretty basic stuff. I doubt the pattern is unique, but it’s one that I’ve used for ages. It’s where I try to do as much “upfront” work as possible, and store “half-baked” results into memory.

Like I said, I always “just worked that way,” and never really thought about it. There’s been a lot of “rule of thumb” stuff in my work. Didn’t have an MIT professor to teach it, but it’s sort of gratifying to see that it wasn’t just “old wives” stuff.

There’s probably a ton of stuff that we do, every day, that we don’t think about. Some of it almost certainly originated from really smart folks like him, finding the best way (like the “winding number” algorithm, in that server[1]), and some of it also probably comes from “grug-brained programmers,” simply doing what makes sense.

[0] https://news.ycombinator.com/item?id=44046227

[1] https://github.com/LittleGreenViper/LGV_TZ_Lookup/blob/e247f...

auto•8mo ago

I'm in the depths of optimization on a game right now, and it's interesting how the gains I'm making currently all seem to be a matter of scaling the concept of lookup tables, and using the right tool for the job.

What I mean is that traditionally I think peoples' ideas of lookup tables are things like statically baked arrays setup at compile time, or even first thing at runtime, and they never change. But if you loosen your adherence to that last idea a bit, where a lookup table can change slightly over time, you can get a ton of mileage out of a comparatively small amount of memory compared to wasting cycles every frame.

As for the right tool for the job, I've read tons of dev logs and research papers over the years about moving work to the GPU, but this last few months stint of ripping my game inside out has really made me see the light. It's not just lookup tables built at compile or early runtime, but lookup tables modified slightly over time, and sent to the GPU as textures and used there.

Follow this train of thought long enough, and now we're just calling memory writes and reads "lookup tables" when they aren't really that anymore, but whos to say where the barrier really lies?

ChrisMarshallNY•8mo ago

Without getting too much into detail (because the people I worked for were really paranoid, and I don't want to give them agita), we used to build lookup tables "on the fly," sometimes, deep inside iterators.

For example, each block of pixels might have some calculated characteristics that were accessed by a hash into a LUT, but the characteristics would change, as we went through the image.

We'd do a "triage" run, where we'd build the LUT, then a "detailed" run, where we'd apply the LUT to the pixels.

It could get pretty hairy.

recursivecaveat•8mo ago

To some extent it is required that all serious work by a computer be the kind of repititious thing that can be at least partially addressed by a lookup table.

If you take the example of a game, drawing sprites say. Drawing a single preloaded sprite of reasonable size is always cheap, so a slow frame must have an excessive number. It's very hard to construct a practical scenario of a large number of truly distinct sprites though. A level has a finite tile palette, a finite cast of characters, abilities, etc. It's hard to logistically get them all into a scene together, and even then it won't be that many. So the only scenario left where sprite drawing will be slow is drawing the same handful of sprites over and over again. By contrast that's super common: just spam a persistent projectile, tap the analog stick to generate dust particles, etc.

alkyon•8mo ago

It's kind of insulting to the reader that they explain P complexity class without using the word polynomial ("all problems that can be solved in a reasonable amount of time")

simpaticoder•8mo ago

Be generous - it saves a lot of time. Once you say "polynomial" readers will think, "like, ANY polynomial, even like n^100?!" and you'll have to explain, yes, but that's STILL better than exponential, etc. They avoided all of that

godelski•8mo ago

Quanta targets people who are above average. So I don't think it is too much for them to give a sentence or two stating that. Or even a little graphic could do wonders. I don't think it would take much time or effort to make a graphic like the one on wikipedia[0] and just throw in some equations within the ring. You can easily simplify too, by removing NL and merging EXP. Hell, look at the graphics here[1]. That's much more work.

I don't think Quanta should be afraid of showing math to people. That's really their whole purpose. Even if I think they've made some egregious mistakes that make them untrustable...[2]

[0] https://en.wikipedia.org/wiki/PSPACE#/media/File:Complexity_...

[1] https://www.quantamagazine.org/june-huh-high-school-dropout-...

[2] https://news.ycombinator.com/item?id=44067043

simpaticoder•8mo ago

I suppose my point is that the readers who will wonder about this are a) very likely to know about complexity classes already, or b)capable of learning about it themselves. Perhaps a simple link to something like https://complexityzoo.net/Petting_Zoo would have been a nice middle-ground.

Edit: Aaronson even mentions the n^100 problem in the section about P!

godelski•8mo ago

I disagree and even think that this is besides the point. It is hard to wonder about what you don't know to wonder about. It is the job of the communicator to prime that and provide any critical information that the reader is not expected to know about. Without some basic explanation here then these terms might as well be black boxes to readers.

The point is that a single line[0] and a minimal graphic could substantially improve the reader's comprehension while simultaneously providing them the necessary nomenclature to find relevant material to further increase their understanding.

Look at this line:

  | One of the most important classes goes by the humble name “P.”

It tells us almost nothing, except of its importance. Only to be followed by

  | Roughly speaking, it encompasses all problems that can be solved in a reasonable amount of time. An analogous complexity class for space is dubbed “PSPACE.”

This tells us nothing... My first thought would by "why not PTIME and PSPACE" if I didn't already know what was going on.

The whole work is about bridging these two concepts! How can we understand that if we don't know what we're building a bridge between? It's like reporting on a bridge being built connecting England and France but just calling it a bridge. Is it important? Sounds like it by the way they talk, but how can you even know the impact of such a thing when not given such critical context? You get tremendous amounts of additional context with the addition of so few words.

nathan_compton•8mo ago

I think this is actually a pretty reasonable description but I also have read Quantum Computing Since Democritus.

nottorp•8mo ago

You can determine both the time and space complexity for any algorithm, should you need to. And in some cases you can adjust to have more of one or the other, depending on your particular constraints.

More at eleven.

amelius•8mo ago

> Williams’ proof established a mathematical procedure for transforming any algorithm — no matter what it does — into a form that uses much less space.

Ok, but space is cheap, and we usually want to trade processing time for space.

I.e., the opposite.

JohnKemeny•8mo ago

Ryan made a dent (a tiny dent) in one of the most important open problems in mathematics.

He's not trying to please programmers.

amelius•8mo ago

But what makes an open problem important? ;)

senfiaj•8mo ago

I always thought that the "reverse" relationship between the time and the space requirements is explained by the simple fact that when you have a set of algorithms with certain time and space requirements and you constrain one of these parameters, the other parameter will not be necessarily (in practice oftentimes not) the most optimal. But it doesn't necessarily mean that the faster algorithm will require less memory or vice versa.

This "reverse" relationship works with other parameters. Such as in sorting algorithms, where besides time and space requirements you have stability requirement. If you constrain the sorting algorithms by stability, you won't have any advantage (will likely to have some disadvantage) in performance over non constrained algorithms. And wee see that the known stable sorting algorithms are either slower or require more memory. Nobody has yet invented a stable sorting algorithm that has comparable performance (both time and space) to non stable sorting algorithms. That's it.

schmorptron•8mo ago

As an aside, I really enjoy a lot of the quanta magazine articles! They manage to write articles that both appeal to compsci people as well as interested "outsiders". The birds-eye view and informal how and why they pack into their explanation style often gives me new perspectives and things to think about!

kazinator•8mo ago

Give algorithm designers a little more memory, and they will find a way to shave off the time.

Give operating system development organizations a lot more memory and they will find a way to worsen the response time.

bgnn•8mo ago

The poetic style of Quanta makes it impossible to understand what does this mean. Can someone familiar with the topic explain is this applicable to real world computers or just a theoretical scenario? Does this mean we need more memory in computers even if they need to run at a lower clock speed?

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Show HN: Engineering Perception with Combinatorial Memetics

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

Notes for February 2-7

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Show HN: Engineering Perception with Combinatorial Memetics

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

Notes for February 2-7

For algorithms, a little memory outweighs a lot of time

Comments