I tried vibe coding in BASIC and it didn't go well

https://www.goto10retro.com/p/vibe-coding-in-basic

160•ibobev•6mo ago

Comments

firesteelrain•6mo ago

Not surprised; there were so many variations of BASIC and unless you train ChatGPT on a bunch of code examples and contexts then it can only get so close.

Try a local LLM then train it

ofrzeta•6mo ago

> ... unless you train ChatGPT on a bunch of code examples and contexts then it can only get so close.

How do you do this?

sixothree•6mo ago

RAG maybe?

firesteelrain•6mo ago

RAG is good suggestion to pull in runtime without weights

oharapj•6mo ago

If you're OpenAI you scrape StackOverflow and GitHub and spend billions of dollars on training. If you're a user, you don't

firesteelrain•6mo ago

Gist is

1. Gather training data

2. Format it into JSONL or Hugging Face Dataset format

3. Use Axolotl or Hugging Face peft to fine-tune

4. Export model to GGUF or HF format

5. Serve via Ollama

https://adithyask.medium.com/axolotl-is-all-you-need-331d5de...

https://www.philschmid.de/fine-tune-llms-in-2025

https://blog.devgenius.io/complete-guide-to-model-fine-tunin...

ofrzeta•6mo ago

So, finetuning? Not so easy with ChatGPT I guess, but thanks for the info anyway.

firesteelrain•6mo ago

Yes, it takes an existing model and fine tunes it. ChatGPT would be basically extensively prompt engineering in a session. Maybe using their API? I have never tried it personally.

When I fine tuned a Mistral 7B model it took hundreds of examples in Alpaca style

It’s a lot of work. Maybe OpenAI has a more efficient way of doing it because in my case I had to manually adjust each prompt

Paradigma11•6mo ago

Gemini Pro 2.5 has a context window of 1 million tokens and wants to rise that to 2 million tokens soon. 1 token is approx 0.75 words, so 1 million tokens would be in the ballpark of 3k pages of code.

You can add some tutorials/language docs as context without any problem. The bigger your project gets the more context it gets from there. You can also convert apis/documentation to a RAG and expose it as a MCP tool to the LLM.

ofrzeta•6mo ago

> Gemini Pro 2.5 has a context window of 1 million tokens and wants to rise that to 2 million tokens soon. 1 token is approx 0.75 words, so 1 million tokens would be in the ballpark of 3k pages of code.

You mean around 3000 files with 3000 characters? That is a lot. I've played with some other LLMs in Agentic AIs but at work we are using Copilot, and when I add context through drag and drop it seems to be limited to some dozen files.

Paradigma11•6mo ago

I think Copilot has some hardcoded limitations of around a dozen files, like you said. But this stuff changes constantly.

ofrzeta•6mo ago

Still I don't totally understand how that huge of a context works for Gemini. I guess you don't provide the whole context for every request? So it keeps (but also updates) context for a specific session?

Paradigma11•6mo ago

I dont know how the massive context works but Caching is certainly a thing and cheaper: https://ai.google.dev/gemini-api/docs/caching?lang=python

Gemini is better than Sonnet if you have broad questions that concern a large codebase, the context size seems to help there. People also use subagents for specific purposes to keep each context size manageable, if possible.

On a related note I think the agent metaphor is a bit harmful because it suggests state while the LLM is stateless.

empressplay•6mo ago

We have Claude Code writing Applesoft BASIC fine. It wrote a text adventure (complete with puzzles) and a PONG clone, among other things. Obviously it didn't do it 100% right straight out of the gate, but the hand-holding wasn't extensive.

I've been using Grok 4 to write 6502 assembly language and it's been a bit of a slog but honestly the issues I've encountered are due mostly my to naivety. If I'm disciplined and make sure it has all of the relevant information and I'm (very) incremental, I've had some success writing game logic. You can't just tell it to build an entire game in a prompt, but if you're gradual about it you can go places with it.

Like any tool, if you understand its idiosyncrasies you can cater for them, and be productive with it. If you're not then yeah, it's not going to go well.

hammyhavoc•6mo ago

Ah yes, truly impressive, Pong. A game that countless textbooks et al have recreated numerous times. There's a mountain of training data for something so unoriginal.

recipe19•6mo ago

I work on niche platforms where the amount of example code on Github is minimal, and this definitely aligns with my observations. The error rate is way too high to make "vibe coding" possible.

I think it's a good reality check for the claims of impending AGI. The models still depend heavily on being able to transform other people's work.

empressplay•6mo ago

I don't know if you're working with modern models. Grok 4 doesn't really know much about assembly language on the Apple II but I gave it all of the architectural information it needed in the first prompt of a conversation and it built compilable and executable code. Most of the issues I encountered were due to me asking for too much in a prompt. But it built a complete, albeit simple, assembly language game in a few hours of back and forth with it. Obviously I know enough about the Apple II to steer it when it goes awry, but it's definitely able to write 'original' code in a language / platform it doesn't inherently comprehend.

timschmidt•6mo ago

This matches my experience as well. Poor performance usually means I haven't provided enough context or have asked for too much in a single prompt. Modifying the prompt accordingly and iterating usually results in satisfactory output within the next few tries.

gompertz•6mo ago

Yep I program in some niche languages like Pike, Snobol4, Unicon. Vibe coding is out of the question for these languages. Forced to use my brain!

johnisgood•6mo ago

You could always feed it some documentation and example programs. I did it with a niche language and it worked out really well, with Claude. Around 8 months ago.

jjmarr•6mo ago

I've noticed the error rate doesn't matter if you have good tooling feeding into the context. The AI hallucinates, sees the bug, and fixes it for you.

winrid•6mo ago

Even with typescript Claude will happily break basic business logic to make tests pass.

CalRobert•6mo ago

That seems like the tests don’t work?

paffdragon•6mo ago

Or at least they don't cover business logic if they pass while breaking it.

winrid•6mo ago

It made the tests fail with the new feature and then removed the feature it just added to make them pass.

motorest•6mo ago

> Even with typescript Claude will happily break basic business logic to make tests pass.

It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met. If you don't explicitly instruct them, they can't and won't tell apart project code from test code. So they will change your project code to make tests work.

This is not a bug. Changing project code to make tests pass is a fundamental approach to refactoring projects, and the whole basis of TDD. If that's not what you want, you need to prompt them accordingly.

chuckadams•6mo ago

Fixing bugs is also changing project code to make tests pass. The assistant is pretty good at knowing which side to change when it’s working from documentation that describes the correct behavior.

desdenova•6mo ago

That's the main problem with vibe coding.

The whole point is having the LLM figure out what you want from vague hand-wavy descriptions instead of precise specification.

You don't need an LLM to parse a precise specification, you have a compiler for that.

motorest•6mo ago

> That's the main problem with vibe coding.

It's not a problem. It's in fact the core trait of vibe-codig. The primary work a developer does in vibe coding tasks is providing the necessary and sufficient context. Hence the inception of the term "context engineering". A vibe coder basically lays out requirements and constraints that drives LLMs to write code. That's the bulk of their task: they shift away from writing the low-level "how" to instead write down the high-level "what".

> The whole point is having the LLM figure out what you want from vague hand-wavy descriptions instead of precise specification.

No. The prompts are as elaborate as you want it to be. I, for example, use prompt files with the project's ubiquitous language and requirements, not to mention test suites used for acceptance tests. You can half-ass your code as much as you can half-ass your prompts.

immibis•6mo ago

Sounds like a compiler with extra steps.

chuckadams•6mo ago

It's entirely possible to have specifications somewhere between "vague hand-wavy descriptions" and source code. But it's really not my job to defend AI against all the people who want it to be completely useless, seem to need it to be so, really. I just use it, it works a lot of the time, doesn't work other times, and that's that. Results carry more weight than opinions.

Terr_•6mo ago

> It's my understanding that LLMs change the code to meet a goal

I assume in this case you mean a broader conventional application, of which an LLM algorithm is a smaller-but-notable piece?

LLMs themselves have no goals beyond predicting new words for a document that "fit" the older words. It may turn 2+2 into 2+2=4, but it's not actually doing math with the goal of making both sides equal.

motorest•6mo ago

> I assume in this case you mean a broader conventional application, of which an LLM algorithm is a smaller-but-notable piece?

Not necessarily. If you prompt a LLM to limit changes to some projects or components, it complies with the request.

brailsafe•6mo ago

Do you mean not just LLMs, but agents? Is this jot avoided by narrowing your scope and just using the chat interface that also may not produce what you're hoping for, but at least can't muck about in your existing code?

DecoySalamander•6mo ago

> This is not a bug

It's not a bug if we're talking about a mischievous jinn granting wishes instead of a productivity tool.

didgeoridoo•6mo ago

Can’t it be both?

desdenova•6mo ago

A jinn granting wishes is the best analogy for LLMs I've seen so far.

Sharlin•6mo ago

Ecept that LLMs aren't mischievous. They're just stupid.

bitwize•6mo ago

> It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met.

"I'm Mr. Meeseeks! Look at meeee!"

winrid•6mo ago

I told it to add a feature and to update the tests. It added the feature, and then removed it because it made the tests fail lol. I know I can make it work, I did, that's not the point.

bapak•6mo ago

Speaking of TypeScript, every time I feed a hard type problem to LLMs they just can't do it. Sometimes I find out it's a TS limitation or just not implemented yet, but that won't stop us from wasting 40 minutes together.

rybosome•6mo ago

I’m currently doing research on this exact problem. Would you care to share an example of an advanced typing issue that you’ve seen LLMs struggle with?

neom•6mo ago

We are building a tool specifically for typescript developers, just launched a couple of months ago and I'd really appreciate if you gave it a try and provided me with feedback, people seem to really like using it. http://charlielabs.ai - thank yooou!!! :)

rs186•6mo ago

When I vibe coded with GitHub Copilot in TypeScript, it keeps using "any" even though those variables had clear interfaces already defined somewhere in the code. This drove me crazy, as I had to go in and manually fix all those things. The only thing that helps a bit is me screaming "DO NOT EVER USE 'any' TYPE". I can't understand why it would do this.

andsoitis•6mo ago

> The models

The models don’t have a model of the world. Hence they cannot reason about the world.

hammyhavoc•6mo ago

"reason" is doing some heavy-lifting in the context of LLMs.

pygy_•6mo ago

I tried vibe coding WebGPU/WGSl, which is thoroughly documented, but has little actual code around, and LLMs are pretty bad at it right now.

They don't need a formal model, they need examples from which they can pilfer.

bawana•6mo ago

The theory is that language is an abstraction built on top of the world and therefore encompasses all human experience of the world. The problem will arise however when the world (aka nature) acts in an unexpected way outside human experience

vineyardmike•6mo ago

Completely agree. I’m a professional engineer, but I like to get some ~vibe~ help on person projects after-work when I’m tired and just want my personal project to go faster. I’ve had a ton of success with go, JavaScript, python, etc. I had mixed-success with writing idiomatic Elixir roughly a year ago, but I’ve largely assumed that this would be resolved today, since every model maker has started aggressively filling training data with code, since we found the PMF of LLM code-assistance.

Last night I tried to build a super basic “barely above hello world” project in Zig (a language where IDK the syntax), and it took me trying a few different LLMs to find one that could actually write anything that would compile (Gemini w/ search enabled). I really wasn’t expecting it considering how good my experience has been on mainstream languages.

Also, I think OP did rather well considering BASIC is hardly used anymore.

pygy_•6mo ago

I've had a similar problem with WebGPU and WGSL. LLMs create buffers with the wrong flags (and other API usage errors), doesn't clean up resources, mix up GLSL and WGSL, write semi-less WGSL (in template strings) if you ask them to write semi-less [0] JS...

It's a big mess.

0. https://github.com/isaacs/semicolons/blob/main/semicolons.js

poniko•6mo ago

Yes and if you work with a plarform that has been arround for long time like .net you will most definitely get a mix of really outdated deprecated code mixed with the latest features.

ragequittah•6mo ago

This is easily solved by feeding the LLM the correct documentation. I was having problems with tailwind because of this right up until I had ChatGPT deep research come up with a spec sheet on how to use the latest version of it. Fed it into the various AIs I've been using (worked for ChatGPT, Claude, and Cursor) and no problems since.

remich•6mo ago

I recommend the context7 MCP tool for this exact purpose. I've been trying to really push agents lately at work to see where they fall down and whether better context can fix it.

As a test recently I instructed an agent using Claude to create a new MCP server in Elixir based on some code I provided that was written in Python. I know that, relatively speaking, Python is over-represented in training data and Elixir is under-represented. So, when I asked the agent to begin by creating its plan, I told it to reference current Elixir/Phoenix/etc documentation using context7 and to search the web using Kagi Search MCP for best practices on implementing MCP servers in Elixir.

It was very interesting to watch how the initially generated plan evolved after using these tools and how after using the tools the model identified an SDK I wasn't even aware of that perfectly fit the purpose (Hermes-mcp).

cmrdporcupine•6mo ago

I find for these kinds of systems, if I pre-seed Claude Code with a read of the language manual (even the BNF etc) and a TLDR of what it is, results are far better. Just part of the initial prompt: read this summary page, read this grammar, and look at this example code.

I have had it writing LambdaMOO code, with my own custom extensions (https://github.com/rdaum/moor) and it's ... not bad considering.

docandrew•6mo ago

Maybe other folks’ vibe coding experiences are a lot richer than mine have been, but I read the article and reached the opposite conclusion of the author.

I was actually pretty impressed that it did as well as it did in a largely forgotten language and outdated platform. Looks like a vibe coding win to me.

sixothree•6mo ago

Here's an example of a recent experience.

I have a web site that is sort of a cms. I wanted users to be able to add a list of external links to their items. When a user adds a link to an entry, the web site should go out and fetch a cached copy of the site. If there are errors, it should retry a few times. It should also capture an mhtml single file as well as a full page screenshot. The user should be able to refresh the cache, and the site should keep all past versions. The cached copy should be viewable in a modal. The task also involves creating database entities, DTOs, CQRS handlers, etc.

I asked Claude to implement the feature, went and took a shower, and when I came out it was done.

hammyhavoc•6mo ago

Let us know how the security audit by human beings on the output goes.

catmanjan•6mo ago

The auditors are using llms too!

sixothree•6mo ago

It's really just a personal project for myself. Why else would I add that feature without any guardrails?

nico•6mo ago

Im pretty new to CC, been using it in a very interactive way.

What settings are you using to get it to just do all of that without your feedback or approval?

Are you also running it inside a container, or setting some sort of command restrictions, or just yoloing it on a regular shell?

sixothree•6mo ago

So CC has a planning mode. Shift-Tab twice to enter planning mode. I wrote out about a paragraph of text for this and it gave me back a todo list. I said "make it so" and it went and did it.

grumpyprole•6mo ago

Sure it did ok with examples that are easily found in a text book like drawing a circle.

serf•6mo ago

please just include the prompts rather than saying "So I said X.."

There is a lot of nuance in how X is said.

pavelstoev•6mo ago

I vibe coded a site about vibe 2 code projects. https://builtwithvibe.com/

esafak•6mo ago

The "Yo dawg, I heard..." memes are writing themselves today.

clambaker117•6mo ago

Wouldn’t it have been better to use Claude 4?

sixothree•6mo ago

I'm thinking Gemini CLI because of the context. He could add some information about the programming language itself in the project. I think that would help immensely.

4b11b4•6mo ago

Even though the max token limit is higher, it's more complicated than that.

As the context length increases, undesirable things happen.

calvinmorrison•6mo ago

This is great. I am currently vibecoding a replacement connector for some old EDI software that is written in a business basic kind of language called ProvideX, and a fork of that has undocumented behaviour.

It uses some built inet ftp tooling thats terrible and barely works, even internally anymore.

We are replacing it with a winscp implementation since winscp can talk over a COM object.

unsuprisingly the COM object in basic works great - the problem is that I have no idea what I am doing. I spent hours doing something like

WINSCP_SESSION'OPEN(WINSCP_SESSION_OPTIONS)

when i needed

WINSCP_SESSION'OPEN(*WINSCP_SESSION_OPTIONS)

It was obvious after because it was a pointer type of setup, but i didnt find it until pages and pages deep into old PDF manuals.

However the vibecode of all the agents did not understand the syntax of the system, it did help me analyse the old code, format it, and at least throw some stuff at the wall.

I finished it up friday, hopefully i deploy monday.

CMay•6mo ago

This does kind of make me wonder.

It's believable that we might either see an increase in the number of new programming languages since making new languages is becoming more accessible, or we could see fewer new languages as the problems of the existing ones are worked around more reliably with LLMs.

Yet, what happens to adoption? Perhaps getting people to adopt new languages will be harder as generations come to expect LLM support. Would you almost need to use LLMs to synthesize tons of code examples that convert into the new language to prime the inputs?

Once conversational intelligence machines reach a sort of godlike generality, then maybe they could very quickly adapt languages from much fewer examples. That still might not help much with the gotchas of any tooling or other quirks.

So maybe we'll all snap to a new LLM super-language in 20 years, or we could be concreting ourselves into the most popular languages of today for the next 50 years.

hammyhavoc•6mo ago

Fantasy.

CMay•6mo ago

Can you elaborate?

hammyhavoc•6mo ago

> Once conversational intelligence machines reach a sort of godlike generality,

Sums up fantastical inevitablism.

Why would they? How could they? The data is telling us that they won't. Anybody who believes otherwise is ignoring the science.

The assumed path to "AGI" was more tokens, but more tokens actually means worse output, and if LLMs aren't a total technological dead-end, period, then the data is supporting smaller models meant for more specific things, ergo, LLMs are not giving AGI any time, ever.

Pure fantasy. Can't even call it sci-fi when it ignores the science entirely.

CMay•6mo ago

Ah, so you misread. I said conversational intelligence machines, but you focused on LLMs.

I don't know where LLMs will lead, but I haven't ruled out the possibility of improvements continuing to surprise us. If anything is unscientific, it would be overconfidence in one way or the other.

hammyhavoc•6mo ago

Confidence comes from statistics. The statistics say that smaller models are better than bigger ones, ergo I have I have confidence in saying what I did.

If anyone is overconfident, I'm afraid it is you.

CMay•6mo ago

Even if smaller more specialized models outperform larger more generalized models for specific tasks with these architectures, that does not logically support your point or serve as meaningful evidence against my post.

These systems can have many models and the systems themselves could eventually arrive at knowledge they know they underperform on, automatically training new models for that subject. The final response wouldn't necessarily come exclusively from the specialized model, it might come through a model specialized at integrating knowledge between multiple models, asking the right questions, verifying and so on.

We're barely a few years into this, so it's premature to know what 100 years of developments will bring.

ilaksh•6mo ago

I think it's a fair article.

However I will just mention a few things. When you make an article like this please take note of the particular language model used and acknowledge that they aren't all the same.

Also realize that the context window is pretty large and you can help it by giving it information from manuals etc. so you don't need to rely on the intrinsic knowledge entirely.

If they used o3 or o3 Pro and gave it a few sections of the manual it might have gotten farther. Also if someone finds a way to connect an agent to a retro computer, like an Atari BASIC MCP that can enter text and take screenshots, "vibe coding" can work better as an agent that can see errors and self-correct.

manca•6mo ago

I literally had the same experience when I asked the top code LLMs (Claude Code, GPT-4o) to rewrite the code from Erlang/Elixir codebase to Java. It got some things right, but most things wrong and it required a lot of debugging to figure out what went wrong.

It's the absolute proof that they are still dumb prediction machines, fully relying on the type of content they've been trained on. They can't generalize (yet) and if you want to use them for novel things, they'll fail miserably.

hammyhavoc•6mo ago

They'll never be fit for purpose. They're a technological dead-end for anything like what people are usually throwing them at, IMO.

zer00eyz•6mo ago

I will give you an example of where you are dead wrong, and one where the article is spot on (without diving into historic artifacts).

I run HomeAssistant, I don't get to play/use it every day. Here, LLM's excel at filling in the (legion) of blanks in both the manual and end user devices. There is a large body of work for it to summarize and work against.

I also play with SBC's. Many of these are "fringe" at best. LLM's are as you say "not fit for purpose".

What kind of development you are using LLM's for will determine your experience with them. The tool may or may not live up to the hype depending how "common", well documented and "frequent" your issue is. Once you start hitting these "walls" you realize that no, real reason, leaps of inference and intelligence are still far away.

SecuredMarvin•6mo ago

I also made this experience. As long as the public level of knowledge is high, LLMs are massively helpful. Otherwise not so much and still hallucinating. It does not matter if you think highly of this public knowledge. QFT, QED and Gravity are fine, AD emulation on SAMBA, or Atari Basic not so much.

If I would program Atari Basic, after finishing my Atari Emulator on my C64, I would learn the environment and test my assumptions. Single shot LLMs questions won't do it. A strong agent loop could probably.

I believe that LLMs are yanking the needle to 80%. This level is easy achievable for professionals of the trade and this level is beyond the ability of beginners. LLMs are really powerful tools here. But if you are trying for 90% LLMs are always trying to keep you down.

And if you are trying for 100%, new, fringe or exotic LLMs are a disaster because they do not learn and do not understand, even while being inside the token window.

We learn that knowledge, (power) and language proficiency are an indicator for crystalline but not fluid intelligence

otabdeveloper4•6mo ago

> yanking the needle to 80%

80 percent of what, exactly? A software developer's job isn't to write code, it's understanding poorly-specified requirements. LLMs do nothing for that unless your requirements are already public on Stackoverflow and Github. (And in that case, do you really need an LLM to copy-paste for you?)

zer00eyz•6mo ago

> fluid intelligence

How about basic intelligence. Kids logic puzzles.

https://daydreampuzzles.com/logic-puzzles/

LLM's whiffing hard on these sorts of puzzles is just amusing.

It gets even better if you change the clues from innocent things like "driving tests" or "day care pickup" to things that it doesn't really want to speak about. War crimes, suicide, dictators and so on.

Or just flat out make up words whole cloth to use as "activates" in the puzzles.

motorest•6mo ago

> They'll never be fit for purpose. They're a technological dead-end for anything like what people are usually throwing them at, IMO.

This comment is detached from reality. LLMs in general have been proven to be effective at even creating complete, fully working and fully featured projects from scratch. You need to provide the necessary context and use popular technologies with enough corpus to allow the LLM to know what to do. If one-shot approaches fail, a few iterations are all it takes to bridge the gap. I know that to be a fact because I do it on a daily basis.

otabdeveloper4•6mo ago

> because I do it on a daily basis

Cool. How many "complete, fully working" products have you released?

Must be in the hundreds now, right?

motorest•6mo ago

> Cool. How many "complete, fully working" products have you released?

Fully featured? One, so far.

I also worked on small backing services, and a GUI application to visualize the data provided by a backing service.

I lost count of the number of API testing projects I vibe-coded. I have a few instruction files that help me vibecode API test suites from the OpenAPI specs. Postman collections work even better.

And I'm far from an expert in the field.

What point were you trying to make?

otabdeveloper4•6mo ago

> What point were you trying to make?

The point is that software developers can't evaluate their own work. (Especially the kind of n00b developers that use LLMs.)

You initially made wild claims about insane productivity gains that turned out to be just one small product and a lot of wasted time under scrutiny.

(Asking LLMs to write tests is a waste of time. LLMs can't evaluate risks, which is the only reason to write tests in the first place.)

jeltz•6mo ago

If you are far from an expert in the field maybe you should refrain from commenting so strongly because some people here actually are experts.

So you have built a few small PoCs, does not tell us much.

motorest•6mo ago

> If you are far from an expert in the field maybe you should refrain from commenting so strongly because some people here actually are experts.

Your opinion makes no sense. Your so called experts are claiming LLMs don't do vibecoding well. I, a non-expert, am quite able to vibecode my way into producing production-ready code. What conclusion are you hoping to draw from that? What do you think your experts' opinion will achieve? Will it suddenly delete the commits from LLMs and all the instruction prompts I put together? What point do you plan to make with your silly appeal to authority?

I repeat: non-experts are proving to be possible, practical, and even mundane what your so-called experts claim to not work. What do you plan to draw from that?

hammyhavoc•6mo ago

No, experts are saying that vibe coding sucks. LLMs absolutely do vibe coding, and the output is shit.

hammyhavoc•6mo ago

Let's see it then.

cactusplant7374•6mo ago

A good trial would be pointing an LLM at Jira and telling it to finish the backlog.

motorest•6mo ago

> A good trial would be pointing an LLM at Jira and telling it to finish the backlog.

GitHub is already rolling out this feature.

https://github.blog/news-insights/product-news/github-copilo...

hammyhavoc•6mo ago

So, where is the substance?

Do what I couldn't with these supposedly capable LLMs:

- A Wear OS version of Element X for Matrix protocol that works like Apple Watch's Walkie Talkie and Orion—push-to-talk, easily switching between conversations/channels, sending and playing back voice messages via the existing spec implementation so it works on all clients. Like Orion, need to be able to replay missed messages. Initiating and declining real-time calls. Bonus points for messaging, reactions and switching between conversations via a list.

- Dependencies/task relationships in Nextcloud Deck and Nextcloud Tasks, e.g., `blocking`, `blocked by`, `follows` with support for more than one of each. A filtered view to show what's currently actionable and hide what isn't so people aren't scrolling through enormous lists of tasks.

- WearOS version of Nextcloud Tasks/Deck in a single app.

- Nextcloud Notes on WearOS with feature parity to Google Keep.

- Implement portable identities in Matrix protocol.

- Implement P2P in Matrix protocol.

- Implement push-to-talk in Element for Matrix protocol ala Discord, e.g., hold a key or press a button and start speaking.

- Implement message archiving in Element for Matrix protocol ala WhatsApp where a message that has been archived no longer appears in the user's list of conversations, and is instead in an `Archived` area of the UI, but when a new message is received in it, it comes out of the Archive view. Archive status needs to sync between devices.

Open source the repo(s) and issue pull requests to the main projects, provide the prompts and do a proper writeup. Pull requests for project additions need to be accepted and it all needs to respect existing specs. Otherwise, it's just yet more hot air in the comments section. Tired of all this empty bragging. It's a LARP and waste of time.

As far as I'm concerned, it is all slop and not fit for purpose. Unwarranted breathless hype akin to crypto with zero substance and endless gimmicks and kidology to appeal to hacks.

Guarantee you can't meaningfully do any of the above and get it into public builds with an LLM, but would love to be proven wrong.

If they were so capable, it would be a revolution in FOSS, and yet anyone who heavily uses it produces a mix of inefficient, insecure, idiotic, bizarre code.

abrookewood•6mo ago

Clearly the issue is that you are going from Erlang/Elixir to Java, rather than the other way around :)

Jokes aside, they are pretty different languages. I imagine you'd have much better luck going from .Net to Java.

nine_k•6mo ago

This mostly means that LLMs are good at simpler forms of pattern matching, and have much harder time actually reasoning at a significant depth. (It's not easy even for human intellect, the finest we currently have.)

tsimionescu•6mo ago

Sure, it's easier to solve an easier problem, news at eleven. In particular, translating from C# to Java could probably be automated with some 90% accuracy using a decent sized bash script.

mattmanser•6mo ago

I once redid a project from VB.Net to C# and pretty much did that.

People misjudge many tasks as 'hard' when they are in fact easy but tedious.

The problem is you need a high degree of accuracy, which you don't get with LLMs.

The best you can do is set the LLM on a loop and try and Brute force it, which is the current vibe 'coding' trick.

I sound pessimistic but I'm actually shocked at how effective it is.

h4ck_th3_pl4n3t•6mo ago

I just wished the LLM model providers would realize this and instead would provide specialized LLMs for each programming language. The results likely would be better.

chuckadams•6mo ago

The local models JetBrains IDEs use for completion are specialized per-language. For more general problems, I’m not sure over-fitting to a single language is any better for a LLM than it is for a human.

conception•6mo ago

I’m curious what your process was. If you just said “rewrite this in Java” I’d expect that to fail. If you treated the llm like a junior developer or an official project, worked with them to document the codebase, come up with a plan, tasks for each part of the code base and a solid workflow prompt- I would expect it to succeed.

Marazan•6mo ago

Yes, if you do all the difficult time consuming bits I bet it would work.

FeepingCreature•6mo ago

It's still a lot faster than doing it yourself ime.

(Yes I've seen the study, it doesn't account for motivation.)

conception•6mo ago

You don’t do the work you just work with them to get the work done to plan it out.

4hg4ufxhy•6mo ago

There is a reason to go the extra mile for juniors. They eventually learn and become seniors. With AI I'd rather just do it myself and be done with it.

conception•6mo ago

But you can just do it once with AI. It’s just a script process that you would set up for any project. It’s just an on boarding process.

And I’ll know by when I say do it once I mean, obviously processes have to be it on to get exactly what you want out of them, but that’s just how process works . Once it’s working the way you want to just reuse it.

nerdsniper•6mo ago

Claude Code / 4o struggle with this for me, but I had Claude Opus 4 rewrite a 2,500 line powershell script for embedded automation into Python and it did a pretty solid job. A few bugs, but cheaper models were able to clean those up. I still haven't found a great solution for general refactoring -- like I'd love to split it out into multiple Python modules but I rarely like how it decides to do that without me telling it specifically how to structure the modules.

credit_guy•6mo ago

If you try to ride a bicycle, do you expect to succeed at the first try? Getting AI code assistants to help you write high quality code takes time. Little by little you start having a feel for what prompts work, what don't, what type of tasks the LLMs are likely to perform well, which ones are likely to result in hallucinations. It's a learning curve. A lot of people try once or twice, get bad results, and conclude that LLMs are useless. But few people conclude that bicycles are useless if they can't ride them after trying once or twice.

ofrzeta•6mo ago

It didn't go well? I think it went quite well. It even produced an almost working drawing program.

abrookewood•6mo ago

Yep, thought the same thing. I guess people have very different expectations.

Radle•6mo ago

I had way better results. I'd assume the same would have happened to the author if he provided the LLM with a full documentation on what ATARI BASIC is and some example programs.

Especially when asking the LLM to create a drawing program and a game the author would have probably received working code if he supplied the ai with documentation to the graphics function and sprite rendering using ATARI BASIC.

xiphias2•6mo ago

4o is not even a coding model and very far from the best coding models OpenAI has, I seriously don't understand why these articles are upvoted so much

throw101010•6mo ago

> I seriously don't understand why these articles are upvoted so much

It confirm a bias for some, it triggers others who might have the opposite position (and maybe have a bias too on the other end).

Perfect combo for successful social media posts... literally all about "attention" from start to finish.

timeon•6mo ago

Short while ago it was like "you are still using <X>? Why not the new 4o?!"

layer8•6mo ago

A coding model doesn't seem to produce better results: https://news.ycombinator.com/item?id=44624222

fcatalan•6mo ago

I had more luck with a little experiment a few days ago: I took phone pics of one of the shorter BASIC listings from Tim Hartnell's "Giant Book of Computer Games" (I learned to program out of those back in the early 80s, so I treasure my copy) and asked Gemini to translate it to plain C. It compiled and played just fine on the first go.

edent•6mo ago

Vibe Coding seems to work best when you are already an experienced programmer.

For example "Prompt: Write me an Atari BASIC program that draws a blue circle in graphics mode 7."

You need to know that there are various graphics modes and that mode 7 is the best for your use-case. Without that preexisting knowledge, you get stuck very quickly.

throwawaylaptop•6mo ago

Exactly this. I'm a self taught PHP/jQuery guy that learned it well enough to make an entire saas that enough companies pay for that it's a decent little lifestyle business.

I started another project recently basically vibe coding in PHP. Instead of a single page app like I made before, it's just page by page single loading. Which means the AI also only needs to keep a few functions and the database in its head, not constantly work on some crazy ui management framework (what that's called).

It's made in a few days what would have taken me weeks as an amateur. Yet I know enough to catch a few 'mistakes' and remind it to do it better.

I'm happy enough.

kqr•6mo ago

Not only is it a useful constraint to ask for mode 7, but making sure the context contains domain-expert technology puts the LLM in a better spot in the sampling space.

cfn•6mo ago

Just for fun I asked ChatGPT "How would you ask an LLM to write a drawing program for the ATARI?" and it asked back a bunch of details to which I answered "I have no idea, just go with the simplest option". It chose the correct graphics mode and BASIC and created the program (which I didn't test).

I still agree with you for large applications but for these simple examples anyone with a basic understanding of vibe coding could wing it.

baxtr•6mo ago

This is a description of a “tool”. Anyone can use a hammer and chisel to carve out wood, but only an artist with extensive experience will create something truly remarkable.

I believe many in this debate are confusing tools with magic wands.

tonyhart7•6mo ago

this marketing and social media buzz that AI (artificial intelligence) that would replace human intelligence or people job for everyone didn't help either

sure it maybe someday but not today, but there are jobs that already get replaced tho for example like writing industry

Sharlin•6mo ago

> I believe many in this debate are confusing tools with magic wands.

Unfortunately, it's usually the ones who control the money.

j4coh•6mo ago

In this case I asked ChatGPT without the part specifying mode 7 and it replied with a working program using mode 7, with a comment at the top that mode 7 would be the best choice.

JdeBP•6mo ago

Previous generations would have simply read something like the circle drawing writeup by Jeffrey S. McArthur in chapter 4 of COMPUTE!'s Third Book of Atari which as a matter of fact is available in scrapable text. (-:

* https://archive.org/details/ataribooks-computes-third-book-o...

* https://atariarchives.org/c3ba/page153.php

Fun fact: Orson Scott Card can be found in chapter 1.

forinti•6mo ago

Exactly! If you can't properly assess the output of the AI, you are really only shooting into the dark.

motorest•6mo ago

> Vibe Coding seems to work best when you are already an experienced programmer.

I think that is a very vague and ambiguous way of putting it.

I would frame it a tad more specific: vibecode seems to work best when users know what they want and are able to set requirements and plan ahead.

Vibecoding doesn't work at all or is an unmaintainable god awful mess if users don't do software engineering and instead hack stuff together hoping it works.

Garbage in, garbage out.

johnisgood•6mo ago

Exactly. I would not like to be called a vibe coder for using an LLM for tedious tasks though, is it not a pejorative term? I used LLMs for a few projects and it did well, because I knew what I wanted and how I wanted. So yeah, you do have to be an experienced programmer to excel with LLMs.

That said, you can learn a lot using LLMs, which is nice. I have a friend who wants to learn Python, and I have given him actual resources, but I have also told him to use LLMs.

ack_complete•6mo ago

Even then, I've seen LLMs generate code with subtle bugs that even experienced programmers would trip on. For the Atari specifically, I've seen:

- Attempting to use BBC BASIC features in Atari BASIC, in ways that parsed but didn't work - Corrupting OS memory due to using addresses only valid on an Apple II - Using the ORG address for the C64, such that it corrupts memory if loaded from Atari DOS - Assembly that subtly doesn't work because it uses 65C02 instructions that execute as a NOP on a 6502 - Interrupt handlers that occasionally corrupt registers - Hardcoding internal OS addresses only valid for the OS ROM on one particular computer model

The POKE 77,0 in the article is another good example. ChatGPT labeled that as hiding the cursor, but that's wrong -- location 77 is the attract timer counter on the Atari OS. Clearing it to 0 periodically resets the timer that controls the OS's primitive screensaver. But in order for this to work, it has to be done periodically -- doing it at the start will just reset this timer once, after which attract mode will start in 9 minutes. So effectively, this is an easter egg that got snuck into the program, and even if the unrequested behavior was desirable, doesn't work.

Earw0rm•6mo ago

Has anyone tried it on x87 assembly language?

For those that don't know. x87 was the FPU for 32-bit x86 architectures. It's not terribly complicated, but it uses stack-based register addressing with a fixed size (eight entry) stack.

All operations work on the top-of-stack register and one other register operand, and push the result onto the top of the stack (optionally popping the previous top of stack before the push).

It's hard but not horribly so for humans to write.. more a case of annoyingly slow and having to be methodical, because you have to reason about the state of the stack at every step.

I'd be very curious as to whether a token-prediction machine can get anywhere with this kind of task, as it requires a strong mental model of what's actually happening, or at least the ability to consistently simulate one as intermediate tokens/words.

silisili•6mo ago

I'm going to doubt that. I was pushing GPT a couple weeks ago to test its limits. It's 100% unable to write compilable Go ASM syntax. In fairness it's slightly oddball, but enough exists that it's not esoteric.

In the error feedback cycle, it kept blaming Go, not itself. A bit eye opening.

messe•6mo ago

I'm comfortable writing asm for quite a few architectures, but Go's assembler...

When I struggle to write Go ASM, I also blame Go and not myself.

Earw0rm•6mo ago

The thing with x87 is that it's easy to write compilable, correct-looking code, and much harder to write correct compilable code, even for trivial sequences of a dozen or so operations.

Whereas in most asm dialects, register AX is always register AX (word length aliasing aside), that's not the case for x87: the object/value at ST3 in one operation may be ST1 or ST5 in a couple of instructions' time.

userbinator•6mo ago

If you are familiar with HP's calculators, x87 Asm isn't that difficult. Also noteworthy is that its density makes it a common choice for tiny demoscene productions.

Earw0rm•6mo ago

Not too bad to write, kind of horrible to read.

FeepingCreature•6mo ago

Prediction: it can do it, so long as you tell it to explicitly keep track of the FPU stack in comments on every FPU instr.

nine_k•6mo ago

All these stories about vibe coding going well or wrong remind me of an old joke.

A man visits his friend's house. There is a dog in the house. The friend says that the dog can play poker. The man is incredulous, but they sit at a table and have a game of poker; the dog actually can play!

The man says: "Wow! Your dog is incredibly, fantastically smart!"

The friend answers: "Oh, well, no, he's a naïve fool. Every time he gets a good hand, he starts wagging his tail."

Whether you see LLMs impressively smart or annoyingly foolish depends on your expectations. Currently they are very smart talking dogs.

kqr•6mo ago

Somehow this also reminds me of http://raisingtalentthebook.com/wp-content/uploads/2014/04/t...

"I taught my dog to whistle!"

"Really? I don't hear him whistling."

"I said I taught him, not that he learnt it."

pyman•6mo ago

A person is flying a hot air balloon and realises he’s lost. He lowers the balloon and spots a man down below. He shouts:

“Excuse me! Can you help me? I promised a friend I’d meet him, but I have no idea where I am.”

The man replies, “You’re in a hot air balloon, hovering 30 feet above the ground, somewhere between 40 and 41 degrees north latitude and between 59 and 60 degrees west longitude.”

“You must be a Prompt Engineer,” says the balloonist.

“I am,” replies the man. “How did you know?”

“Well,” says the balloonist, “everything you told me is technically correct, but it’s of no use to me and I still have no idea where I am.”

The man below replies, “You must be a Vibe Coder.”

“I am,” says the balloonist. “How did you know?”

"Because you don’t know where you are or where you’re going. You made a promise you can’t keep, and now you expect me to solve your problem. The fact is, you’re in the same position you were in before we met, but now it’s somehow my fault!"

johnisgood•6mo ago

This is really good, saving it. :D

pimlottc•6mo ago

This is an old joke with the job titles changed.

https://www.reddit.com/r/Jokes/comments/74lb9d/engineer_vs_m...

a4isms•6mo ago

A joke from the same family involves a pilot who gets lost in IMC conditions, and his radio is dead. Suddenly, looming out of the gloom is an office building with an open window. An employee spots the plane going by and waves. The pilot circles around and shouts "Where am I?"

The employee shouts back "In an airplane!" The pilot nods, sets a course, and navigates precisely 11 nautical miles NNE, and descends. Sure enough, a runway appears and he lands perfectly. Of course, he's debriefed about the incident. "How, " the investigator asks, "Did you find the airport?"

The pilot recounts the shouted conversation. "My question was answered in a way that was absolutely correct but useless for any practical purpose. I reasoned that I was talking to Microsoft Tech Support at their HQ, and I could dead reckon to Redmond Municipal Airport from there."

Marazan•6mo ago

The variation I've seen on this applied to AIs is:

Fred insists to his friend that he has a hyper intelligent dog that can talk. Sceptical, the friend enquires of the dog "What's 2+2?"

"Five" says the dog

"Holy shit a talking dog!" says the friend "This is the most incredible thing that I've ever seen in my life".

"What's 3+3?"

"Eight" says the dog.

"What is this bullshit you're trying to sell me Fred?"

codeflo•6mo ago

This joke is a closer analogy to reality with a small addition. After the friend is suitably impressed:

> "Holy shit a talking dog!" says the friend "This is the most incredible thing that I've ever seen in my life".

this happens:

"Yes," says Fred. "As you can see, it's already at PhD level now, constantly improving, and is on track to replace 50% of the economy in twelve months or sooner."

Confused, the friend asks:

> "What's 3+3?"

> "Eight" says the dog.

> "What is this bullshit you're trying to sell me Fred?"

FeepingCreature•6mo ago

Reality:

> The friend asks: "Alright. Doggy, answer as fast as you can without thinking. What is 387 times 521?"

> "201.627", the dog says, wagging his tail happily.

> "I don't buy it, Fred! This must be bullshit!"

(I did test it and it did give the correct answer. I was expecting a small error for the joke, but hey, works like this too.)

totetsu•6mo ago

Try playing an adversarial word game with ChatGPT. like the rules are, one player asks questions and the other is not allowed to say "yes" or "no", not allowed to reuse the same wording, and not allowed to evade the question. You'll see its tail wagging pretty quickly.

ragequittah•6mo ago

You could very likely train an AI (llama?) to easily do this but trying to get a general LLM to play a game such as this doesn't make sense. Best way to get around it? Have it create a python program that will play the game correctly instead.

markus_zhang•6mo ago

From my experience (we have to vibe code as it becomes a norm in the company), vibe coding is most effective when the developer feeds detailed context to the agent beforehand, and gives very specific commands to it for each task. It still speeds up development quite a bit once everything goes the right direction.

6LLvveMx2koXfwn•6mo ago

I vibe code in domains I am unfamiliar with - getting Claude to configure the right choice of AWS service for a specific use-case can take a very long time. But would that still be quicker than me alone with docs is hard to tell.

markus_zhang•6mo ago

I have never done any serious DevOps or Admin tasks, but I can imagine the pain. AWS doc is the best of the three IMO but getting the right paragraph is still a nightmare.

tliltocatl•6mo ago

Nobody says LLMs aren't impressive. But there is a subtle difference an impressive trick and being something worth throwing 1% of GDP at. Proponents say that if we keep throwing more money at it, it will improve, but this is far from certain.

glimshe•6mo ago

We've thrown fantastical amounts on some very uncertain things, like the moon landing. I think that the willingness of betting big at a potentially transformative technology such as AI is a good thing and a bit of a return to the old days when humanity still engaged in big infrastructure bets. Yes, it may fail, but that's intrinsic of any ambitious project.

Mountain_Skies•6mo ago

Doubt many would have supported the Apollo program if it meant Sam Altman got to be King of the Moon.

mhh__•6mo ago

Some people definitely think they aren't impressive.

> 1 % of GDP

LLMs are basically the only thing genuinely new in decades , that have someone excited in basically every dept in the entirely world, why is it so bad that we spend money on them? The alternative is going back to shovelling web3 crap.

There's definitely a new generation of bullshit merchants to go with LLMs but I think they (the models) target a very different part of the brain to normal tech so in some ways they're much more resilient to usual fad archetypes (this is also why some people who are a bit jittery socially hate them)

nextaccountic•6mo ago

> why is it so bad that we spend money on them?

This money (and more importantly, this electricity and this amount of silicon) is diverted from other stuff.

mhh__•6mo ago

Such as? Dumping money into private equity?

overgard•6mo ago

We could be spending that money on real problems instead of trying to automate away half the population

SR2Z•6mo ago

Name one "real problem" that isn't easier to solve if humans are spending less time on bullshit (read: easily automated) jobs.

immibis•6mo ago

This is an underrated fact. We spend so much money on stupid hype because our system is built to keep giving more and more money to promising technologies, but we're running out of actually promising technologies, so the system is (a) focusing the exponentially growing money into just a few techs, and (b) falling for snake oil.

mrits•6mo ago

The alternative is to not throw money at AI. The amount that we spend could be justified just under a national security budget and not even increased GDP

b33j0r•6mo ago

I get the sense that people might mean that the transformer paradigm might not scale. But I do not understand the argument that AI in general is hype, and that investing in it is cult-like.

It’s just a technology; one that will improve, sometimes stagnate, sometimes accelerate. Like anything else, right? I don’t see a time when we’ll just stop using AI because it “feels so trite.”

nominallyfree•6mo ago

When software mangles data this badly, it is considered garbage and scrapped.

But if it is a half decent chatbot and has the label "AI", it gets another iteration with 10x the resources. This has happened a few times already.

It is a neat tool. It is very unreliable. Teams saying "just give us 10x as much resources so we can insist on this approach" is the hateful thing here.

ak_111•6mo ago

I asked chatgpt to write a short fable about the phenomena of vibe coding in the style of Aesop:

The Owl and the Fireflies

One twilight, deep in the woods where logic rarely reached, an Owl began building a nest from strands of moonlight and whispers of wind.

"Why measure twigs," she mused, "when I can feel which ones belong?"

She called it vibe nesting, and declared it the future.

Soon, Fireflies gathered, drawn to her radiant nonsense. They, too, began to build — nests of smoke and echoes, stitched with instinct and pulse. "Structure is a cage," they chirped. "Flow is freedom."

But when the storm came, as storms do, their nests dissolved like riddles in the rain.

Only the Ants, who had stacked leaves with reason and braced walls with pattern, slept dry that night. They watched the Owl flutter in soaked confusion, a nestless prophet in a world that demanded substance.

Moral: A good feeling may guide your flight, but only structure will hold your sky.

suddenlybananas•6mo ago

This is incoherent

Yokolos•6mo ago

When I start getting nonsense back and forth prompting, I've found it best to just start a new chat/context with the latest working version and then try again with a slightly more detailed prompt that tries to avoid the issues encountered in the previous chat. It usually helps. AI generally quickly gets itself lost, which can be annoying.

danjc•6mo ago

What's missing here is tool use.

For example, if the llm had a compile tool it would likely have been able to correct syntax errors.

Similarly, visual errors may also have been caught if it were able to run the program and capture screens.

wbolt•6mo ago

Exactly! The way in which the LLM is used here is very, very basic and outdated. This experiment should be redone in a proper „agentic” setup where there is a feedback loop between the model and the runtime plus access to documentation / internet. The goal now is not to encapsulate all the knowledge inside single LLM - this is too problematic and costly. LLM is a language model not knowledge database. It allows to interpret and interact with knowledge and text data from multiple sources.

Mikhail_Edoshin•6mo ago

I don't know if anyone notices that, but LLMs are very much like what you see in dreams when you reflect on them when awake. When you asleep a dream feels very coherent; but when you're awake, you see wild gaps and jumps and the overall impression of many dreams' plot is that it is pure nonsense.

I once heard advice on trying to re-read the text you see in dreams. And I did that once. It was a phrase where one of words referred to a city. The first time I read that city was "London". I remembered the advice and re-read the phrase and the word changed to the name of an old Russian city "Rostov". Yet the phrase was "same", that is it felt same in the dream, even though the city was different.

LLMs are like that dream mechanics. It is how something else is reflected in what we know (e.g. an image of a city is rendered as a name of a city, just any city). So, on one hand, we do have a similar mechanism in our minds. But on another hand our normal reasoning is very much unlike that. It would be a very wild stretch to believe that reasoning somehow stems from dreaming. I'd say reasoning is opposite to dreaming. If we amplify our dreaming mechanics we won't get a genius; more likely we'll get a schizophrenic.

neom•6mo ago

I think that is a cool way of looking at it - build on what you said, to me it might be a bit like in both instances (dreaming and llms) - it's maybe a bit related to what they are trying to do (presuming dreams have a purpose) + the resource they have to do it in + the context they have to couple in to get the point across + something related to the abilities of the user? Lets for fun say there is a subsystem that understands and runs dreaming, you only have so much time, plus it's a weird modality, and you're trying to do something... maybe it's fine enough to just serve up the story no matter how muddled it is, what you might be talking about in LLMs is a similar thing? Dreams have a ticking clock where the brain chemistry is literally changing and the opportunity for that type of processing is about to disappear, and eventually the human will awaken, LLMs have a context window size. Fun thinking, anyway.

Fade_Dance•6mo ago

Well it's neither schizophrenic nor using human reasoning. It's a machine using transformers no reason to overly anthropomorphize it, but sure the parallels do exist to some degree.

These tools are obviously not reasoning tools in the classic sense, as they're not building off of core axiomic truths like classic logic. These truths may be largely embedded in the probabilistic output, due to the input universe that we feed it being based on human reason, but it's certainly not in a power of these tools. That said, of course we are attacking on more and more of this ability (ability to check databases of facts, iterative "reasoning" models, etc) so they are becoming "adequate" in these respects.

The dreaming comparison seems quite apt though. I entirely get what you mean by rereading the dream word and seeing it suddenly transformed to another city name, yet somehow also "fitting". For some reason I'm keenly aware of these sorts of relationships when I think about my dreams. I will think of a situation and immediately be able to identify the input memories that "built" the dream scenario. Usually they involve overlapping themes and concepts, as well as some human specific common targets like "unresolved, emotionally charged, danger, etc" (presumably running through these types of brain neurons provides some sort of advantage for mammals, which makes sense to me).

What an LLM does is essentially create a huge interconnected conceptual web of the universe which is fed, and then uses probably ballistic models to travel through these chains, much like how dreaming does a trance like dance through these conceptual connections. In the former example though, we have optimized the traversal to be as close to a "mock awake human" as possible. If the dream poem is dreary in nature, and Rostov sounds dreary, and you were hearing about the dreary London rain earlier in the day, and you have a childhood memory of reading through a dreary poem that made you very sad, that's the perfect sort of overlapping set of memory synapses for a dream to light up. And when looking back, all you'll see is a strange fantasmic and (usually, not always) frustratingly inaccessible conglomeration of these inputs.

This sort of traversal isn't just used in dreaming though. To some degree we're doing similar things when we do things like creative thinking. The difference is, and this is especially so in "modern" life, we're strongly filtering our thoughts through language first and foremost (and indeed there's a lot of philosophical and scientific work done about how extremely important language is to humanness), but also through basic logic.

LLM's inherent some of the power/magic of language, in that it is deconstructing the relationships between the concepts behind language that contains so much embedded meaning. But they aren't filtering through logic like we do. Well, reasoning models do to some degree, but it is obviously quite rudimentary.

I think it's a good analogy.

I'll tell you a secret. The same happens also with mainstream languages

ghuntley•6mo ago

It's not really 'vibe coding' if you're copying and pasting from ChatGPT by hand...

alfiedotwtf•6mo ago

I’ve been using my AirPods to talk to ChatGPT while I drive into work.. not coding talk though, using that time to pick up particle physics. So far, nothing looks hallucinated, though I’m only touching the surface and haven’t looked at equations yet.

.. either way, I’m super happy that it has kept my drives to work very interesting!

dogman1050•6mo ago

It never did draw a blue circle, it's orange or similar but that's never mentioned in the article.

ofrzeta•6mo ago

Even though I was impressed with the original article (so kind of contrary to the author) in the meantime I tried the same thing with Claude Sonnet 4 (because some people here criticized the approach due to not using a proper "coding model") and got no better results. Now I tried about a dozen iterations but it did not manage to create a "BASIC programm for Atari 800XL that makes use of display list interrupts to draw rainbow-like colored horizontal stripes", although this is like a "hello world" for that technique and there should be plenty of samples on the Internet. I am curious to see if anyone can make that work with an LLM.

JKCalhoun•6mo ago

Are there really plenty of examples on the internet?

My first thought reading the article was that Atari BASIC is kind of a little specialized. If BASIC is kind of an under-represented language in general on the internet (you know, compared to Javascript, for example) then Atari BASIC has to be a white whale.

ofrzeta•6mo ago

Yeah, when you google for "atari display list interrupts" a lot of similar code comes up. Here's a complete example https://makezine.com/article/technology/generate-the-atari-r...

Also, sure, compared to Javascript it's a small amount.

j45•6mo ago

It can only code to the average level of content that it's trained on.

heisenbit•6mo ago

Maybe the Basic code out there has a too high similarity with spaghetti making it hard to abstract in latent space. OO and functional patterns are more localized.

bethekidyouwant•6mo ago

It literally worked fine but then he called it a bust for no reason. Also free version of chatgpt is a bold choice

benbristow•6mo ago

Annoying website. Cookie policy banner at the start with the usual 'essential' or 'all', then get about a fifth down the article and the page fades out making the article unreadable with a box asking me to subscribe.

rzz3•6mo ago

> With the rise of LLM systems (or “AI” as they are annoyingly called),

I made it this far and realized the rest wasn’t worth reading. Language evolves, words change, and AI means what it means now. It turns out it’s actually really useful to have an abstraction above the concept of LLMs to talk about the broader set of these types of technologies, and generally speaking I find that these very pedantic types of people don’t bring me useful new perspectives.

andriamanitra•6mo ago

> Language evolves, words change, and AI means what it means now. It turns out it’s actually really useful to have an abstraction above the concept of LLMs to talk about the broader set of these types of technologies

I agree that "AI" can be useful as an umbrella term, but using it when referring specifically to the "LLM" subset of AI technologies is not useful. A ton of information about the capabilities and limitations of the system is lost when making that substitution. I understand why marketing departments are pushing everything as "AI" to sell a product but as consumers we should be fighting against that.

rzz3•6mo ago

People who aren’t software engineers need to be able to converse about these technologies, and no one outside of our world cares about the difference between LLMs and other types of AI.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Total surface area required to fuel the world with solar (2009)

Brookhaven Lab's RHIC concludes 25-year run with final collisions

LLMs as the new high level language

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

First Proof

Vouch

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Why there is no official statement from Substack about the data leak

Al Lowe on model trains, funny deaths and working with Disney

Show HN: A luma dependent chroma compression algorithm (image compression)

FDA intends to take action against non-FDA-approved GLP-1 drugs

Start all of your commands with a comma (2009)

The AI boom is causing shortages everywhere else

I write games in C (yes, C) (2016)

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Selection rather than prediction

Homeland Security Spying on Reddit Users

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Reinforcement Learning from Human Feedback

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Total surface area required to fuel the world with solar (2009)

Brookhaven Lab's RHIC concludes 25-year run with final collisions

LLMs as the new high level language

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

First Proof

Vouch

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Why there is no official statement from Substack about the data leak

Al Lowe on model trains, funny deaths and working with Disney

Show HN: A luma dependent chroma compression algorithm (image compression)

FDA intends to take action against non-FDA-approved GLP-1 drugs

Start all of your commands with a comma (2009)

The AI boom is causing shortages everywhere else

I write games in C (yes, C) (2016)

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Selection rather than prediction

Homeland Security Spying on Reddit Users

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Reinforcement Learning from Human Feedback

I tried vibe coding in BASIC and it didn't go well

Comments