Inventing on Principle: https://www.youtube.com/watch?v=PUv66718DII
I mean even for something that is in theory fully understandable like the linux kernel it is not feasible to actually read the source before using it.
To me this really makes no sense. Even for traditional programming we only have so powerful systems because we use a layered approach. You can look into these layers and understand them but it is totally out of scope for a single human being.
I believe this is the crux of what the author is getting at: LLMs are, by their very nature, a black box that cannot ever be understood. You will never understand how an LLM reached its output, because their innate design prohibits that possibility from ever manifesting. These are token prediction machines whose underlying logic would take mathematicians decades to reverse engineer even a single query, by design.
I believe that’s what the author was getting at. As we can never understand LLMs in how they reached their output, we cannot rely on them as trustworthy agents of compute or knowledge. Just like we would not trust a human who gives a correct answer much of the time but can never explain how they knew that answer or how they reached that conclusion, so should we not trust LLMs in that same capacity.
Unless they have a lot of knowledge in electrical engineering/optics, the average user of this isn't going to understand how the camera or projector work except at a very high level.
I feel like the problem with LLMs here is more that they are not very predictable in their output and can fail in unexpected ways that are hard to resolve. You can rely on the camera to output some bits corresponding to whatever you're pointing it at even if you don't know anything about its internals.
Fwiw I personally describe than as white, not black boxes. For we know, and can trace back every single bit of the output, back to the input. That does not help us as much as we'd like though. When drilling down into "why did the model answer wrongly 1, and not rightly 2", it comes down to "well, it added one trillion small numbers, and the sum came close to 1, but didn't reach 2". Which is unsatisfactory, and your "understanding" v.s. "comprehension" delineates that nicely.
Maybe more productive to think of them more "artefacts", less "mechanical contraptions". We shape them in many ways, but we are not in complete control of their making. We don't make them explicitly with out hands: we make a maker algorithm, and that algorithm then makes them. Or even "biological", grown artefacts. Given we don't control the end result fully. Yes we know and apply the algorithm that builds them, but we don't know the end result before hand, the final set of weights. Unlike say when we are making a coffee machine - we know all the parts to a millimetre in advance, have it all worked out pre-planned, before embarking on the making of the machine.
One of the challenges I found when I played with RealTalk is interoperability. The aim is to use the "spacial layer" to bootstrap people's intuitions on how programs should work, and interact with the world. It's really cool when this works. But key intuitions about how things interact when combined with each other, only work if the objects have been programmed to be compatible. A balloon wants to "pop if it comes into contact with anything sharp". A cactus wants to say "I am sharp". But if someone else has programmed a needle card to say "I am pointy", then it won't interact with the balloon in a satisfying way. Or, to use one of Dynamicland's favorite examples: say I have an interactive chart which shows populations of different countries when I place the "Mexico card" into the filter spot. What do you think should happen if I put a card showing the Mexican flag in that same spot, or some other card which just says the string "Mexico" on it? Wouldn't it be better if their interaction "just works"?
Visual LLMs can aid with this. Even a thin layer which can assign tags or answer binary questions about objects could be used to make programs massively more interoperable.
For Dynamicland I get the issue though putting the whole thing through an LLM to make pointy and sharp both trigger the same effects on another card would just hide the interaction entirely. It could or couldn't work for reasons completely opaque to both designer and user.
It's still at the cool demo level, though. How do you scale this thing?
The typical “scale” mindset is almost the opposite of that — the people doing the scaling are the ones with agency, and the rest get served slop they didn’t choose!
If the system is an unreliable demo, then that can promote agency. In the same way that you could fix your car 40 years ago, but you can’t now, because of scaled corporate processes.
You can fix your car just fine - just not the electronics. And those were to a large degree added for safety reasons. It is due to the complexity that they are difficult or impossible to fix.
I love the project but it's nearly a decade old and still lives in one location or places Bret's directly collaborated with like the biolab. [0]
[0] https://dynamicland.org/2023/Improvising_cellular_playground...
If you really wanted to play around with similar ideas it doesn't take a needing to do a full reimplemention of the reactive engine.
Occlusion is definitely a problem.
Do still need to keep hands out of the light to see everything but that can also be part of the interaction too. If we ever get ubiquitous AR glasses or holograms I'm sure Bret will integrate them into DL.
[0] Which leads to a bit of a catch 22 you want a surface that looks dark but prefectly reflects all the colors of your projector so you need a white screen which means you ideally want zero other light other than the projector to make the projector act the most like a screen.
I've seen systems like this that use multiple projectors from different angles, calibrated for the space and the angle. They're very effective at preventing occlusion, and it takes fewer than you'd think (also see Valve's Lighthouse tech for motion tracking).
Unfortunately, doing that is expensive, big, and requires recalibrating whenever it's moved.
I've made a lot of progress recently working on my own homebrew version, running it in the browser in order to share it with people. Planning to take some time soon to take another stab at the real (physical) thing.
Progress so far: https://deosjr.github.io/dynamicland/
Under visibility they say:
>To empower people to understand and have full agency over the systems they are involved in, we aim for a computing system that is fully visible and understandable top-to-bottom — as simple, transparent, trustable, and non-magical as possible
But the programming behind the projector-camera system feels like it would be pretty impenetrable to the average person, right? What is so different about AI?
I think the vision is neat but hampered by the projector tech and the cost of setting up a version of your own, since it's so physically tied and Bret is (imo stubbornly) dedicated to the concept there's not a community building on this outside the local area that can make it to DL in person. It'd be neat to have a version for VR for example and maybe some day AR becomes ubiquitous enough to make it work anywhere.
[0] Annoyingly it's not open sourced so you can't really build your own version easily or examine it. There have been a few attempts at making similar systems but they haven't lasted as long or been as successful as Bret's Dynamicland.
I'm reading more about the "OS" Realtalk
>Some operating system engineers might not call Realtalk an operating system, because it’s currently bootstrapped on a kernel which is not (yet) in Realtalk.
You definitely couldn't fit the code for an LLM on the wall, so that makes sense. But I still have so many questions.
Are they really intending to have a whole kernel written down? How does this work in practice? If you make a change to Realtalk which breaks it, how do you fix it? Do you need a backup version of it running somewhere? You can't boot a computer from paper (unless you're using punch cards or something) so at some level it must exist in a solely digital format, right?
I think even if you could squeeze down an LLM and get it to run in realtalk I don't think it fits with the radical simplicity model they're going for. LLMs are fundamentally opaque, we have no idea why they output what they do in the end and can only twiddle the prompt knobs as a user which is the complete opposite direction from a project that refuses to provide the tools to build a version because it's putting the program back into the box instead of fileted out into the physical instantiation.
I wish he'd relent and package it up in a way that could be replicated more simply than reimplementing entirely from scratch.
I'm not sure where to draw the line between Realtalk and the underlying operating system. I'm willing to give it some credit, it's interesting without being written entirely from scratch. IIRC most of the logic that defines how things interact IS written in Realtalk and physcially accessible within the conceptual system instead of only through traditional computing.
Like, you can write a script that talks to functionality that may or may not exist yet.
Programming by moving pieces of paper around deservedly gets attention, but there's a lot more to it.
pmkary•3h ago