I'm glad they got it out quickly.
interesting that Waymo could do uninterrupted trips back in 2013, wonder what took them so long to expand? regulation? tailend of driving optimization issues?
noticed one of the slides had a cross over 'AGI 2027'... ai-2027.com :)
but in hindsight looks like this slowed them down quite a bit despite being early to the space...
You need image processing just as much as you need scenario management, and they're orthoganol to each other, as one example.
If you want a general transport system... We do have that. It's called rail. (And can and has been automated.)
Eh, he ran Teslas self driving division and put them into a direction that is never going to fully work.
What they should have done is a) trained a neural net to represent sequence of frames into a physical environment, and b)leveraged Mu Zero, so that self driving system basically builds out parallel simulations into the future, and does a search on the best course of action to take.
Because thats pretty much what makes humans great drivers. We don't need to know what a cone is - we internally compute that something that is an object on the road that we are driving towards is going to result in a negative outcome when we collide with it.
The counter argument is that you can't zoom in and fix a specific bug in this mode of operation. Everything is mashed together in the same neural net process. They needed to ensure safety, so testing was crucial. It is harder to test an end-to-end system than its individual parts.
It's also worth mentioning that humans intentionally (and safely) drive into "solid" objects all the time. Bags, steam, shadows, small animals, etc. We also break rules (e.g. drive on the wrong side of the road), and anticipate things we can't even see based on a theory of mind of other agents. Human driving is extremely sophisticated, not reducible to rules that are easily expressed in "simple" language.
It was a nice feeling while it lasted.
| old school facebook where people only interacted with their own private friend circles is better.
100% agree but crazy that option doesn't exist anymore.
A recurring theme presented, however, is that LLM's are somehow not controlled by the corporations which expose them as a service. The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.
Also, the OS analogy doesn't make sense to me. Perhaps this is because I do not subscribe to LLM's having reasoning capabilities nor able to reliably provide services an OS-like system can be shown to provide.
A minor critique regarding the analogy equating LLM's to mainframes:
Mainframes in the 1960's never "ran in the cloud" as it did
not exist. They still do not "run in the cloud" unless one
includes simulators.
Terminals in the 1960's - 1980's did not use networks. They
used dedicated serial cables or dial-up modems to connect
either directly or through stat-mux concentrators.
"Compute" was not "batched over users." Mainframes either
had jobs submitted and ran via operators (indirect execution)
or supported multi-user time slicing (such as found in Unix).
I don't think that's what he said, he was identifying the first customers and uses.
> I don't think that's what he said, he was identifying the first customers and uses.
The portion of the presentation I am referencing starts at or near 12:50[0]. Here is what was said:
I wrote about this one particular property that strikes me
as very different this time around. It's that LLM's like
flip they flip the direction of technology diffusion that
is usually present in technology.
So for example with electricity, cryptography, computing,
flight, internet, GPS, lots of new transformative that have
not been around.
Typically it is the government and corporations that are
the first users because it's new expensive etc. and it only
later diffuses to consumer. But I feel like LLM's are kind
of like flipped around.
So maybe with early computers it was all about ballistics
and military use, but with LLM's it's all about how do you
boil an egg or something like that. This is certainly like
a lot of my use. And so it's really fascinating to me that
we have a new magical computer it's like helping me boil an
egg.
It's not helping the government do something really crazy
like some military ballistics or some special technology.
Note the identification of historic government interest in computing along with a flippant "regular person" scenario in the context of "technology diffusion."You are right in that the presenter identified "first customers", but this is mentioned in passing when viewed in context. Perhaps I should not have characterized this as "a recurring theme." Instead, a better categorization might be:
The presenter minimized the control corporations have by
keeping focus on governmental topics and trivial customer
use-cases.
0 - https://youtu.be/LCEmiRjPEtQ?t=770Plus, your historical corrections were spot on. Sometimes, good criticisms just get lost in the noise online. Don't let it get to you!
Seems like you could set a LLM loose and like the Google Bot have it start converting all html pages into llms.txt. Man, the future is crazy.
Website too confusing for humans? Add more design, modals, newsletter pop ups, cookie banners, ads, …
Website too confusing for LLMs? Add an accessible, clean, ad-free, concise, high entropy, plain text summary of your website. Make sure to hide it from the humans!
PS: it should be /.well-known/llms.txt but that feels futile at this point..
PPS: I enjoyed the talk, thanks.
Not a browser plugin, but you can prefix URLs with `pure.md/` to get the pure markdown of that page. It's not quite a 1:1 to llms.txt as it doesn't explain the entire domain, but works well for one-off pages. [disclaimer: I'm the maintainer]
(I'm the creator of the llms.txt proposal.)
llms.txt is a description for an LLM of how to find the information on your site needed for an LLM to use your product or service effectively.
If it were up to me, which it very much is not, I would try to optimize the next AISUS for more of this. I felt like I was getting smarter as the talk went on.
/.well-known/ exists for this purpose.
example.com/.well-known/llms.txt
Having said that, this won't work for llms.txt, since in the next version of the proposal they'll be allowed at any level of the path, not only the root.
The primagen reviewed this article[1] a few days ago, and (I think) that's where I heard about it. (Can't re-watch it now, it's members only) 8(
[1] https://medium.com/@drewwww/the-gambler-and-the-genie-08491d...
Modern military drones are very much AI agents
Yes, I am talking about formal verification, of course!
That also goes nicely together with "keeping the AI on a tight leash". It seems to clash though with "English is the new programming language". So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English? I think that is possible, if you have a formal language and logic that is flexible enough, and close enough to informal English.
Yes, I am talking about abstraction logic [1], of course :-)
So the goal would be to have English (German, ...) as the ONLY programming language, invisibly backed underneath by abstraction logic.
The problem with trying to make "English -> formal language -> (anything else)" work is that informality is, by definition, not a formal specification and therefore subject to ambiguity. The inverse is not nearly as difficult to support.
Much like how a property in an API initially defined as being optional cannot be made mandatory without potentially breaking clients, whereas making a mandatory property optional can be backward compatible. IOW, the cardinality of "0 .. 1" is a strict superset of "1".
It immediately makes me think a LLM that can generate a customized GUI for the topic at hand where you can interact with in a non-linear way.
1. Similar cost structure to electricity, but non-essential utility (currently)?
2. Like an operating system, but with non-determinism?
3. Like programming, but ...?
Where does the programming analogy break down?
The programming analogy is convenient but off. The joke has always been “the computer only does exactly what you tell it to do!” regarding logic bugs. Prompts and LLMs most certainly do not work like that.
I loved the parallels with modern LLMs and time sharing he presented though.
When I have tried to "pair program" with an LLM, I have found it incredibly tedious, and not that useful. The insights it gives me are not that great if I'm optimising for response speed, and it just frustrates me rather than letting me go faster. Worse, often my brain just turns off while waiting for the LLM to respond.
OTOH, when I work in a more async fashion, it feels freeing to just pass a problem to the AI. Then, I can stop thinking about it and work on something else. Later, I can come back to find the AI results, and I can proceed to adjust the prompt and re-generate, to slightly modify what the LLM produced, or sometimes to just accept its changes verbatim. I really like this process.
I want AI as a "co-worker" providing an alternative perspective or implementing my specific instructions, and potentially filling in gaps I didn't think about in my prompt.
"Don't be snarky."
Maybe I missed a source but I assumed it was common knowledge.
https://en.m.wikipedia.org/wiki/List_of_Tesla_Autopilot_cras...
Abstractions don't eliminate the need to understand the underlying layers - they just hide them until something goes wrong.
Software 3.0 is a step forward in convenience. But it is not a replacement for developers with a foundation, but a tool for acceleration, amplification and scaling.
If you know what is under the hood — you are irreplaceable. If you do not know — you become dependent on a tool that you do not always understand.
The hardest parts of a jit is the safety aspect. And AI already violates most of that.
The question is: how much do we really care about the "how", even when we think we care about it? Modern programming language don't do guessing work, but they already abstract away quite a lot of the "how".
I believe that's the original argument in favor of coding in assembler and that it will stay relevant.
Following this argument, what AI is really missing is determinism to a far extend. I can't just save my input I have given to an AI and can be sure that it will produce the exact same output in a year from now on.
This reminds me of the Three Amigos and Grady Booch evangelizing the future of software while ignoring the terrible output from Rational Software and the Unified Process.
At least we got acknowledgment that self-driving remains unsolved: https://youtu.be/LCEmiRjPEtQ?t=1622
And Waymo still requires extensive human intervention. Given Tesla's robotaxi timeline, this should crash their stock valuation...but likely won't.
You can't discuss "vibe coding" without addressing security implications of the produced artifacts, or the fact that you're building on potentially stolen code, books, and copyrighted training data.
And what exactly is Software 3.0? It was mentioned early then lost in discussions about making content "easier for agents."
gchamonlive•5h ago
However I think it's important to make it clear that given the hardware constraints of many environments the applicability of what's being called software 2.0 and 3.0 will be severely limited.
So instead of being replacements, these paradigms are more like extra tools in the tool belt. Code and prompts will live side by side, being used when convenient, but none a panacea.
karpathy•3h ago
miki123211•2h ago
To me, it's a criminally underused tool. While "raw" LLMs are cool, they're annoying to use as anything but chatbots, as their output is unpredictable and basically impossible to parse programmatically.
Structured outputs solve that problem neatly. In a way, they're "neural networks without the training". They can be used to solve similar problems as traditional neural networks, things like image classification or extracting information from messy text, but all they require is a Zod or Pydantic type definition and a prompt. No renting GPUs, labeling data and tuning hyperparameters necessary.
They often also improve LLM performance significantly. Imagine you're trying to extract calories per 100g of product, but some product give you calories per serving and a serving size, calories per pound etc. The naive way to do this is a prompt like "give me calories per 100g", but that forces the LLM to do arithmetic, and LLMs are bad at arithmetic. With structured outputs, you just give it the fifteen different formats that you expect to see as alternatives, and use some simple Python to turn them all into calories per 100g on the backend side.