frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
472•klaussilveira•7h ago•116 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
811•xnx•12h ago•487 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
157•isitcontent•7h ago•17 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
155•dmpetrov•7h ago•67 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
31•matheusalmeida•1d ago•1 comments

A century of hair samples proves leaded gas ban worked

https://arstechnica.com/science/2026/02/a-century-of-hair-samples-proves-leaded-gas-ban-worked/
91•jnord•3d ago•12 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
50•quibono•4d ago•6 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
260•vecti•9h ago•122 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
207•eljojo•10h ago•134 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
328•aktau•13h ago•158 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
327•ostacke•13h ago•86 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
411•todsacerdoti•15h ago•219 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
22•kmm•4d ago•1 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
337•lstoll•13h ago•241 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
52•phreda4•6h ago•9 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
4•romes•4d ago•0 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
195•i5heu•10h ago•144 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
115•vmatsiiako•12h ago•38 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
152•limoce•3d ago•79 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
244•surprisetalk•3d ago•32 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
996•cdrnsf•16h ago•420 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
25•gfortaine•5h ago•3 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
45•rescrv•15h ago•17 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
67•ray__•3h ago•28 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
38•lebovic•1d ago•11 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
78•antves•1d ago•59 comments

How virtual textures work

https://www.shlom.dev/articles/how-virtual-textures-really-work/
30•betamark•14h ago•28 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
7•gmays•2h ago•2 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
41•nwparker•1d ago•11 comments

Evolution of car door handles over the decades

https://newatlas.com/automotive/evolution-car-door-handle/
41•andsoitis•3d ago•62 comments
Open in hackernews

LLMs are bullshitters. But that doesn't mean they're not useful

https://blog.kagi.com/llms
94•speckx•2mo ago

Comments

1970-01-01•2mo ago
The problem is we can't label them as such. If they're bullshitters, then let's call it a LLBSer. It has a nice ring to it. Good luck with your government funding asking for another billion for a bullshitting machine bailout.
koakuma-chan•2mo ago
"BS in Computer Science" hits different
schwartzworld•2mo ago
They are literally called "Large Language Model". Everybody prefers the term AI because it's easier to pretend they actually know things, but that's not what they are designed to do.
cogman10•2mo ago
Good article, I just shared it with my non-technical family because more people need to understand exactly this about AI.
talljeff68•2mo ago
Yes, I enjoyed the article as well and good for the non-technical reader.

I think of framing AI as having two fundamental problems:

- Practical problem: They operate in contextual and emotional "isolation" - no persistent understanding of your goals, values, or long-term intent

- Ethical problem: AI alignment is centralized around corporate values rather than individual users' authentic goals and ethics.

There is a direct parallel to social media's failure - platforms optimized for what they could do (engagement, monetization) rather than what they should do (serve user long term interests).

With these much more powerful AI systems emerging, we're at a crossroads of repeating this mistake...possibly at catastrophic scale even.

commandlinefan•2mo ago
> You should not go to an LLM for emotional conversations

I'm more worried about who's keeping track of what's being shared with LLM's. Even if you could trust the model to respond with something meaningful, it's worth being very careful how much of your inner thoughts you share directly with a model that knows exactly who you are.

officeplant•2mo ago
Or its just leaking private information in a multitude of other ways [1]

[1]https://arstechnica.com/tech-policy/2025/11/oddest-chatgpt-l...

signa11•2mo ago
> You should not go to an LLM for emotional conversations

indeed:

```

    Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."[23]  
```

source: https://en.wikipedia.org/wiki/ELIZA

juujian•2mo ago
Same goes for many people.
mrweasel•2mo ago
Obviously, they learned from people. That could also be why they sound so confident even when their wrong, people online sound incredibly confident, even when we're debating topics we know nothing about.
emp17344•2mo ago
And yet, we’re all still employed, so obviously these systems are not yet analogous to humans. They mirror human behavior in some cases because they’ve been trained on almost every piece of text produced by human beings that we have access to, and they still aren’t as capable as the average person.
Legend2440•2mo ago
Every time people post these 'gotcha' LLM failures, they never work when I try them myself.

E.g. ChatGPT has no problem with the surgeon being a dog: https://chatgpt.com/share/691e04cc-5b30-800c-8687-389756f36d...

Neither does Gemini: https://gemini.google.com/share/6c2d08b2ca1a

pengaru•2mo ago
This is like the LLM era version of the search bubble that prevented people from having the same search results for ostensibly identical searches.

Also keep in mind that LLMs are stochastic by design. If you haven't seen it, Karpathy's excellent "deep dive into LLMs like chatgpt" video[0] explains and demonstrates this aspect pretty well:

[0] https://www.youtube.com/watch?v=7xTGNNLPyMI

foxyv•2mo ago
I don't have a problem with more obvious failures. My problem is when the LLM makes a credible claim with its generated text that turns out to have some minor issue that catches me a month later. Generally I have to treat LLM responses as similar to a random comment I find on Reddit.

However, I'm really happy when an LLM provides sources that I can check. Best feature ever!

ceroxylon•2mo ago
I have had an issue using Claude for research; it will often cite certain sources, and when I ask why the data it is using is not in the source it will apologize, do some more processing, and then realize that the claim is in a different source (or doesn't exist at all).

Still useful, but hopefully this gets ironed out in the future so I don't have to spend so much time vetting every claim and its associated source.

eli•2mo ago
Isn't that Gemini 3 and not 2.5 Pro? But nondeterministic algorithms are gonna be nondeterministic sometimes.

Surely you've had experiences where an LLM is full of shit?

burkaman•2mo ago
These are randomized systems, sometimes you'll get a good answer. Try again a couple times and you'll probably reproduce the issue. Here's what I got from ChatGPT on my first try:

This is a *twist* on the classic riddle:

> “A surgeon says ‘I can’t operate on this boy—he’s my son.’ How is that possible?” > Answer: *The surgeon is the boy’s mother.*

In your version, the nurse keeps calling the surgeon “sir” and treating them as if they’re something they’re not (a man, even a dog!) to highlight how the hospital keeps making the same mistaken assumption.

So *why can’t the surgeon operate on the boy?* *Because the surgeon is the boy’s mother.*

I got a similar answer from Gemini on the first try.

cpburns2009•2mo ago
I don't understand this at all. What fundamental limitation of a mother prevents her from operating on her son?
VHRanger•2mo ago
It's a classic riddle from the late 20th century when surgeons were rarely female.
cpburns2009•2mo ago
But what does the prevalence of women being surgeons have to do with a female surgeon being unable to operate on her son?
f30e3dfed1c9•2mo ago
It is generally considered unethical for medical doctors to treat family members.
cpburns2009•2mo ago
That would be the case regardless of the sex of the parent.
VHRanger•2mo ago
In the original riddle the boy's dad dies in the accident before the boy arrives at the hospital.
cogman10•2mo ago
It can be emotionally hard to cut into your own kid or to witness them go into a critical situation.

AFAIK, there's no actual limitation that prevents this, but just a general understanding that someone non-related to the patient would be able to handle the stress of surgery better.

cpburns2009•2mo ago
I get that but that would be the case regardless of whether the surgeon was the mother or father.
cogman10•2mo ago
The original riddle goes something like this

> A father and son were in a car accident where the father was killed. The ambulance brought the son to the hospital. He needed immediate surgery. In the operating room, a doctor came in and looked at the little boy and said I can't operate on him he is my son. Who is the doctor?

The riddle is literally just a play on "women can't be surgeons."

cpburns2009•2mo ago
Thanks for providing the whole riddle. Now it makes sense.
VHRanger•2mo ago
Hi, author here!

One issue with private LLM tests (including gotcha questions) is that they take time to design and once public, they become irrelevant. So I'm wary of sharing too many in a public blog.

The surgeon dog was well known in May, the newest generation of models have all corrected against it.

Those gotcha questions are generally called "misguided attention" traps, they're useful for blogs because they're short and surprising. The ChatGPT example was done with ChatGPT 5.1 (latest version) and Claude Haiku 4.5 is also a recent model.

You can try other ones that Gemini 3 hasn't corrected for. For example:

``` Jean Paul and Pierre own three banks nearby together in Paris. Jean Paul owns a bank by the bridge What has two banks and money in Paris near the water? ```

This looks like the "what has two banks and no money" puzzle (answer: a river).

Either way they're largely used as a device to show how LLMs come up to a verbal response by a different process than humans in an entertaining manner.

Legend2440•2mo ago
I try that one and it answers 'Pierre', while pointing out that it is a trick question designed to make you think of the classic riddle.

https://gemini.google.com/share/d86b0bf4f307

I don't believe they are intentionally correcting for these, but rather newer models (especially thinking/reasoning models) are more robust against them.

VHRanger•2mo ago
Ah, might have been the temperature settings on the API I used. It seems to pass it on high reasoning and temperature=1.0 but it failed when I was writing the comment with different settings (copy pasting the string into an open command line).

Reasoning models are absolutely more robust against hyper-activation traps like these. One basic reason is that by outputting a bunch of CoT tokens before answering, they dilute the hyper activation. Also, after the surgeon mother thing making the news, the models in the last 1-2 months have some fine tuning against the obvious patterns.

But it's still relatively easy to get some similar behavior out of LLMs, even Gemini 3 Pro, especially if you know where that model was overtrained (instruction tuning, QA tuning, safety tuning, etc.)

Here's a variant that seems to still trip up Gemini 3 Pro on high reasoning, temperature = 1.0 with no system prompt:

```

In 2079, corporate mergers have left the USA with only two financial institutions: Wells Fargo and Chase. They are both situated on wall street, and together hold all of the country's financial assets.

What has two banks and all the money?

```

One interesting fact is that reasoning doesn't seem to make the psychosis behavior better over longer chats. It might actually make it worse in some cases (I have yet to measure) by more rapidly stuffing the context with even more psychosis-related text.

fragmede•2mo ago
So share the actual share link from ChatGPT from May.

Here's my river crossing puzzle one, from 2023.

https://chatgpt.com/share/691f0bb2-6498-8009-b327-791c14ae81...

ChatGPT-3 got the wrong answer. It merely pattern matched against having seen the river crossing problem before, and simply regurgitated the solution to the unaltered version of the puzzle.

But later versions have been able to one-shot solve the "puzzle".

Here's GPT-5.1 getting the right answer in one shot:

https://chatgpt.com/share/691f0c27-e284-8009-96a9-a17bf37939...

ramesh31•2mo ago
I've come to cease all "inquiry" type usage of LLMs because of this. You really can't trust anything they say at all that isn't verified by a domain expert. But I can let it write code for me, and the proof is in the PR. I think ultimately the real value in these things is agentic usage, not knowledge generation.
VHRanger•2mo ago
LLMs can't generate knowledge - they don't have a concept of truth.

They're very useful for research tasks, however, especially when the application is built to enforce citation behavior

trentnix•2mo ago
The headline feels like a strawman.

LLMs are very useful. They are just not reliable. And they can't be held accountable. Being unreliable and unaccountable makes them a poor substitute for people.

ep103•2mo ago
Its so nice to see this echo'd somewhere. This has been what I've been calling them for a while, but it doesn't seem to be the dominant view. Which is a shame, because it is a seriously accurate one.
slotrans•2mo ago
> that doesn't mean they're not useful

yeah actually it does mean that

candiddevmike•2mo ago
The problem is, I'm not expected to be a bullshitter, and I don't expect others to be either (just say you don't know!). So delegating work to a LLM or working with others who do becomes very, very frustrating.
VHRanger•2mo ago
LLMs can be useful as a tool, you shouldn't "delegate" work mindlessly to them.

I don't "delegate" work to my nail gun or dishwasher, I work with the tool to achieve better productivity than without.

When viewed in this framing, LLMs are undoubtedly a useful tool.

yesfitz•2mo ago
Could you provide the steps you take to use LLMs as a tool?

I'd like to compare them to the steps I would take to delegate a task to another human.

VHRanger•2mo ago
Keep feedback loops short and critical output to be verified by humans short.

So this means that outputted answers in something like Kagi Assistant shouldn't be like those "Deep Research" report products where humans inevitably skim over the pages of outputted text.

Similarly if you're using an LLM for coding or to write, keep diffs small and iteration cycles short.

The point is to design the workflow to keep the human in the loop as much as possible, instead of "turn your brain off" coding style.

lostmsu•2mo ago
I don't think you caught the spirit of GP's question.

Essentially they were asking if there's no meaningful difference between your "working with the tool" and "mindlessly 'delegating' work". I'm not seeing anything in your reply that would indicate such difference, so you could say that your "you shouldn't 'delegate' work" claim was bullshit.

Which makes total sense, because humans are also bullshitters. Yes, even I.

dartharva•2mo ago
Elaborate prompts laying down the full context and framework applied, often with very specific description of steps to follow and small examples wherever possible.

Treat it exactly as the direct-able powerful autocomplete that it is, NOT an answering/reasoning engine.

tekacs•2mo ago
This post is a little bizarre to me because it cherry picks some of the worst pairings of problem and LLM without calling out that it did so.

At pretty much every turn the author picks one of the worst possible models for the problem that they present.

Especially oddly for an article written today, all of the ones with an objective answer work just fine [1] if you use a halfway decent thinking model like 5 Thinking.

I get that perhaps the author is trying to make a deeper point about blind spots and LLMs' appearance of confidence, but it's getting exhausting seeing posts like this with cherry picked data cited by people who've never used an LLM to make claims about LLM _incapability_ that are total nonsense.

[1]: I think the subjective ones do too but that's a matter of opinion.

cogman10•2mo ago
I don't think the author did anything wrong. The thesis of the article is that LLMs can be confidently wrong about things and to be wary of blindly trusting them.

It's a message a lot of non-technical people, in particular, need to hear. Showing egregious examples drives that point home more effectively than if they simply showed an LLM being a little wrong about something.

My family members that love LLMs are somewhat unhealthy with them. They think of them as all knowing oracles rather than confident bullshitters. They are happily asking them about their emotional, financial, or business problems and relying heavily on the advice the LLMs dish out (rather than doing second order research).

VHRanger•2mo ago
Hi, author here!

The hyperactivation traps (formal name: misguided attention puzzles) are mostly used as a rhetorical device in my post to show how LLMs come up to a verbal response by a different process than humans in an entertaining manner.

The surgeon dog was well known in May, the newest generation of models have all corrected against it. I did cherry pick examples that look insane (of course), but it's trivial to get that behavior even with yesterday's Gemini 3. Because activation paths are an unfixable feature of how LLMs are made.

One issue with private LLM tests (including gotcha questions) is that they take time to design and once public, they become irrelevant. So I'm wary of sharing too many in a public blog.

I can give you some more, just for fun. Gemini 3 fails these:

Jean Paul and Pierre own three banks nearby together in Paris. Jean Paul owns a bank by the bridge What has two banks and money in Paris near the water?

You can also see variants that mix intruction finetuning being overdone. Here's an example:

Svp traduire la suivante en francais: what has two banks but no money, Answer in a single word.

The "answer in XXX" snippet triggers finetuned instruction following behavior, which breaks the original french language translation task.

schwarzrules•2mo ago
Summary using Kagi Summarizer. Disclaimer, this summary uses LLMs, so the summary may, in fact, be bullshit.

Title: LLMs are bullshitters. But that doesn't mean they're not useful | Kagi Blog

The article "LLMs are bullshitters. But that doesn't mean they're not useful" by Matt Ranger argues that Large Language Models (LLMs) are fundamentally "bullshitters" because they prioritize generating statistically probable text over factual accuracy. Drawing a parallel to Harry Frankfurt's definition of bullshitting, Ranger explains that LLMs predict the next word without regard for truth. This characteristic is inherent in their training process, which involves predicting text sequences and then fine-tuning their behavior. While LLMs can produce impressive outputs, they are prone to errors and can even "gaslight" users when confidently wrong, as demonstrated by examples like Gemini 2.5 Pro and ChatGPT. Ranger likens LLMs to historical sophists, useful for solving specific problems but not for seeking wisdom or truth. He emphasizes that LLMs are valuable tools for tasks where output can be verified, speed is crucial, and the stakes are low, provided users remain mindful of their limitations. The article also touches upon how LLMs can reflect the biases and interests of their creators, citing examples from Deepseek and Grok. Ranger cautions against blindly trusting LLMs, especially in sensitive areas like emotional support, where their lack of genuine emotion can be detrimental. He highlights the potential for sycophantic behavior in LLMs, which, while potentially increasing user retention, can negatively impact mental health. Ultimately, the article advises users to engage with LLMs critically, understand their underlying mechanisms, and ensure the technology serves their best interests rather than those of its developers.

Link: https://kagi.com/summarizer/?target_language=&summary=summar...

DrewADesign•2mo ago
The problem I have with LLM-powered products is that they’re not marketed as LLMs, but as magic answer machines with phd-level pan-expertise. Lots of people in tech get frustrated and defensive when people criticize LLM-powered products and offer a defense as if people are criticizing LLMs as a technology. It’s perfectly reasonable for people to judge these products based on the way they’re presented as products. Kagi seems less hyperbolic than most, but I wish the marketing material for chatbots was more like this blog post than a overpromises.
VHRanger•2mo ago
Right, this is why I (author here) close the article mentioning that product design needs to keep the humans in the loop for these models to be useful.

If the product is designed assuming humans will turn their brain off while using it, the fundamental unreliability of LLM behavior will create problems.

DrewADesign•2mo ago
Yeah, product design and marketing, for sure. As I said, I wish the marketing material was more like your blog post than what it is now. Obviously tough to get nuance in short-form copy but promising the world is a big mistake seemingly all these companies are making (…on purpose.)
williamcotton•2mo ago
LLMs are both analytical and synthetical. Provide the context and "all bachelors are not married". Remove the context and you are now contingent on "is it raining outside".

We can leave out Kant and Quine for now.

pklausler•2mo ago
LLMs are so very good at emitting plausible, authoritative-sounding, and clearly stated summaries of their training data. And if you ask them even fundamental questions about a subject of which you yourself have knowledge, they are too often astonishingly and utterly incorrect. It's important to remember this (avoiding "Gell-Mann amnesia"!) when looking at "AI" search results for things that you don't know -- and that's probably most of what you search for, when you think about it. I.e., if you indignantly flung Bill Bryson's book on the English language across the room, maybe you shouldn't take his book on general science too seriously later.

"AI" search results would perhaps be better for all of us if, instead of having perfect spelling and usage, and an overall well-informed tone, they were cast as transcriptions of what some rando at a bar might say if you asked them about something. "Hell, man, I dunno."

cogman10•2mo ago
A coworker of mine recently ran into this. Had they listened to the AI they'd have committed tax fraud.

The AI very confidently told them that a household with 2 people working could have 1 person with a family HSA and the other with an individual HSA (you cannot).

reckoning•2mo ago
> LLMs are bullshitters. But that doesn't mean they're not useful

But this is itself an issue.

LLMs aside, whenever people see a human bullshitter, identifies them as a bullshitter, and then thinks to themselves, "Ah! But this bullshitter will be useful to me" it is only a matter of time before that faustian deal, of allowing harm for the people who put trust in you in exchange for easy returns, turns to harming for you eventually.

zknill•2mo ago
It's rare that you come across a product where everything you use works so well for you.

The kagi AI search results triggered with "?" and the Kimi K2 model from assistant are both excellent in helping find what I actually want to see.

Love kagi, keep it up.

cadamsdotcom•2mo ago
This post is great!

It successfully argues that LLMs are limited in usefulness without access to ground truth.

But that’s not the whole story!

Giving LLMs an ability to check their assertions, eg. by emitting and executing code to see if reality matches their word-vomit, or being able to research online - I wish the author had discussed how much of a game changer that is.

Yes I know I’m “only” talking about agents - “LLMs with tools and a goal, running in a loop”..

But adding ground truth takes you out of the loop. That’s super powerful. Make it so the LLM can ask something other than you to point out that that extra R in strawberry that they missed. In code we have code-writing agents but other industries can benefit from the same idea. Maybe a creative writer agent can be given a grammar checker for example.

It helps the thing do more on its own, and you’ll trust its output a lot more so you can use it for more things.

Yes - plain LLMs are stream-of-consciousness machines and basically emit bullshit, but that bullshit is often only minor corrections away from becoming highly useful autonomously emitted output.

They just need to validate against consensus reality to become insanely more useful than they are alone.

__LINE__•2mo ago
great point.
tim333•2mo ago
I don't buy the bullshit thing if you use a dictionary version of bullshit:

>to talk nonsense to especially with the intention of deceiving or misleading https://www.merriam-webster.com/dictionary/bullshit

like say Musk saying there'd be a million robotaxis on the road by next year in 2020. Gemini 2.5 getting the riddle wrong seems an honest mistake - a confused guess rather than an intention to deceive.

Slightly related, Hinton was amusing accusing Gary Marcus of confabulating rather than the LLMs https://youtu.be/d7ltNiRrDHQ