frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
250•theblazehen•2d ago•82 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
22•AlexeyBrin•1h ago•1 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
705•klaussilveira•15h ago•206 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
967•xnx•21h ago•558 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
66•jesperordrup•5h ago•28 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
7•onurkanbkrc•42m ago•0 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
135•matheusalmeida•2d ago•35 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
42•speckx•4d ago•34 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
68•videotopia•4d ago•6 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
13•matt_d•3d ago•2 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
39•kaonwarb•3d ago•30 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
45•helloplanets•4d ago•46 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
237•isitcontent•16h ago•26 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
237•dmpetrov•16h ago•126 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
340•vecti•18h ago•147 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
506•todsacerdoti•23h ago•247 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
389•ostacke•21h ago•97 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
303•eljojo•18h ago•187 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•186 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
3•andmarios•4d ago•1 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
428•lstoll•22h ago•284 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
71•kmm•5d ago•10 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
25•1vuio0pswjnm7•2h ago•14 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
23•bikenaga•3d ago•11 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
96•quibono•4d ago•22 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
270•i5heu•18h ago•219 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
34•romes•4d ago•3 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1079•cdrnsf•1d ago•461 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
64•gfortaine•13h ago•30 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
304•surprisetalk•3d ago•44 comments
Open in hackernews

How to wrangle non-deterministic AI outputs into conventional software? (2025)

https://www.domainlanguage.com/articles/ai-components-deterministic-system/
42•druther•3w ago

Comments

ironbound•3w ago
[flagged]
dang•3w ago
"Don't be snarky."

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

ramity•3w ago
Let me first start off by saying I and many others have stepped in this pitfall. This is not an attack, but a good faith attempt to share painfully acquired knowledge. I'm actively using AI tooling, and this comment isn't a slight on the tooling but rather how we're all seemingly putting the circle in the square hole and it fits.

Querying an LLM to output its confidence in its output is a misguided pattern despite being commonly applied by many. LLMs are not good at classification tasks as the author states. They can "do" it, yes. Perhaps better than random sampling can, but random sampling can "do" it as well. Don't get too tied to that example. The idea here is that if you are okay with something getting the answer wrong every so often, LLMs might be your solve, but this is a post about conforming non-deterministic AI into classical systems. Are you okay if your agentic agent picks the red tool instead of the blue tool 1%, 10%, etc of the time? If so, you're never not going to be wrangling, and that's the reality often left unspoken when integrating these tools.

While tangential to this article, I believe its worth stating that when interacting with an LLM in any capacity, remember your own cognitive biases. You often want the response to work, and while generated responses may look good and fit your mental model, it requires increasingly obscene levels of critical evaluation to see through the fluff.

For some, there will be inevitable dissonance reading this, but consider that these experiments are local examples. Its lack of robustness will become apparent with large scale testing. The data spaces these models have been trained on are unfathomably large in both quantity and depth, but under/over sampling bias will be ever present (just to name one).

Consider the the following thought experiment: You are an applicant for a job submitting your resume with knowledge it will be fed into an LLM. Let's confine your goal into something very simple. Make it say something. Let's oversimplify for the sake of the example and say complete words are tokens. Consider "collocations". [Bated] breath, [batten] down, [diametrically] opposed, [inclement] weather, [hermetically] sealed. Extend this to contexts. [Oligarchy] government, [Chromosome] biology, [Paradigm] technology, [Decimate] to kill. With this in mind, consider how each word of your resume "steers" the model's subsequent response, and consider how the data each model is trained on can subtly influence its response.

Now let's bring it home and tie the thought experiment into confidence scoring in responses. Let's say its reasonable to assume that the results of low accuracy/low confidence models are less commonly found on the internet than higher performing ones. If that can be entertained, extend the argument to confidence responses. Maybe the term "JSON" or any other term used in the model input is associated with high confidences.

Alright, wrapping it up. The end point here is that the model output provided confidence value is not the likelihood of the answer provided in the response but rather the most likely value following the stream of tokens in the combined input and output. The real sampled confidence values exist closer to code, but they are limited to each token. Not series of tokens.

arduanika•3w ago
"when interacting with an LLM in any capacity, remember your own cognitive biases. You often want the response to work, and while generated responses may look good and fit your mental model, it requires increasingly obscene levels of critical evaluation to see through the fluff."

100% this.

Idk about the far-out takes where "AI is an alien lifeform arrived into our present", but the first thing we know about how humans relate to extraterrestrials is: "I want to believe".

an0malous•3w ago
Aren’t transformers intrinsically deterministic? I thought the randomness was intentional to make chatbots seem more natural, and OpenAI used to have a seed parameter you could set for deterministic output. I don’t know why that feature isn’t more popular, for the reasons this article outlines
ares623•3w ago
Maybe it allowed spitting out copyrighted works verbatim
jkaptur•3w ago
(I'm not an expert. I'd love to be corrected by someone who actually knows.)

Floating-point arithmetic is not associative. (A+B)+C does not necessarily equal A+(B+C), but you can get a performance improvement by calculating A, B, and C in parallel, then adding together whichever two finish first. So, in theory, transformers can be deterministic, but in a real system they almost always aren't.

10000truths•3w ago
Not an expert either, but my understanding is that large models use quantized weights and tensor inputs for inference. Multiplication and addition of fixed-point values is associative, so unless there's an intermediate "convert to/from IEEE float" step (activation functions, maybe?), you can still build determinism into a performant model.
kimixa•3w ago
Fixed point arithmetic isn't truly associative unless they have infinite precision. The second you hit a limit or saturate/clamp a value the result very much depends on order of operations.
10000truths•3w ago
Ah yes, I forgot about saturating arithmetic. But even for that, you wouldn't need infinite precision for all values, you'd only need "enough" precision for the intermediate values, right? E.g. for an inner product of two N-element vectors containing M-bit integers, an accumulator with at least ceil(log2(N))+2*M bits would guarantee no overflow.
kimixa•3w ago
True, you can increase bit width to guarantee never hit those issues, but right now saturating arithmetic on types that pretty commonly hit those values is the standard. Guaranteeing it would be a significant performance drop and/or memory use increase with current techniques to the level it would significantly affect availability and cost compared to what people expect.

Similarly you could not allow re-ordering of operations and similar - so the results are guaranteed to be deterministic (even if still "not correct" compared to infinite precision arithmetic) - but that would also have a big performance cost.

amelius•2w ago
I don't think the order of operations is non-deterministic between different runs. That would make programming and researching these systems more difficult than necessary.
saagarjha•2w ago
It would be if you used atomics.
amelius•2w ago
I said: don't think it's non-deterministic, two negations -> deterministic.
saagarjha•2w ago
It’s usually not too difficult or expensive to avoid doing this.
Const-me•2w ago
> you can get a performance improvement by calculating A, B, and C in parallel, then adding together whichever two finish first

Technically possible, but I think unlikely to happen in practice.

On the higher level, these large models are sequential and there’s nothing to parallelize. The inference is a continuous chain of data dependencies between temporary tensors which makes it impossible to compute different steps in parallel.

On the lower level, each step is a computationally expensive operation on a large tensor/matrix. These tensors are often millions of numbers, the problem is very parallelizable, and the tactics to do that efficiently are well researched because matrix linear algebra is in wide use for decades. However, it’s both complicated and slow to implement fine grained parallelism like “adding together whichever two finish first” on modern GPUs. Just too much synchronization, when total count of active threads is many thousands, too expensive. Instead, operations like matrix multiplications are often assigning 1 thread per output element or fixed count of output elements, and reduction like softmax or vector dot product are using a series of exponentially decreasing reduction steps, i.e. order is deterministic.

However, that order may change with even minor update of any parts of the software, including opaque pieces at the low level like GPU drivers and firmware. Library developers are updating GPU kernels, drivers, firmware and OS kernels collectively implementing scheduler which assigns work to cores, both may affect order of these arithmetic operations.

janalsncm•3w ago
Transformers are just a special kind of binary which are run by inference code. Where the rubber meets the road is whether the inference setup is deterministic. There’s some literature on this: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

I don’t think the issue is determinism per se but chaotic predictions that are difficult to rely on.

an0malous•3w ago
I agree they could be chaotic but I think that’s an important distinction
bpodgursky•3w ago
Strict deterministic output for a given prompt prevents the use of RAG, which increasingly limits the relative utility of a LLM within an organization.
qayxc•2w ago
How so? RAG is just a mechanism for querying external data sources. I don't see any need for non-determinism there.
solsane•3w ago
Well, you could say that about computers in general. I'm assuming you're referring to temperature (or something similar) which can be set to always pick the most probable token. Floats aside, this should be deterministic. But practically I don't think that changes much since adjusting the input slightly can lead to very different output. Also back in the day the temperature helped it avoid cyclic loops
an0malous•3w ago
Yes but chaotic is very different than non deterministic, and not just in an academic way because e.g. I can write tests against chaotic outputs but not really against non deterministic outputs.
esafak•3w ago
The models generate a token distribution. Which one to pick is a choice. One can sample from the distribution, hence the randomness.
clarionbell•2w ago
No, not unless you have a very specific notion of determinism. Some basic operations use arithmetic with finite precision in a way that isn't associative and therefore isn't reproducible. And CUDA introduces its own set of problems[1].

[1] https://docs.nvidia.com/cuda/cublas/index.html#results-repro...

johndough•2w ago
Determinism of LLMs has often been discussed on HN, for example here:

https://news.ycombinator.com/item?id=37006224

https://news.ycombinator.com/item?id=45200925

The TL;DR is that LLMs are often not deterministic because GPUs compute submatrices in parallel and sum them up in different orders, depending on which finish first. This is maybe a few percent faster than always using the same order, but it absolutely could be made deterministic if people cared enough. CUDA even provides deterministic primitives if desired. Of course also use the same random seed for samplers, but that is trivial.

ironbound•3w ago
Use one of these structured output libraries:

https://github.com/outlines-dev/outlines

https://github.com/jxnl/instructor

https://github.com/guardrails-ai/guardrails

https://www.askmarvin.ai/docs/text/transformation/

Some of them allow a JSON schema, others a Pydantic model (which you can transform to/from JSON).

zby•2w ago
Yeah - the author seems oblivious to that option. But to be fair - this post is more about the more basic step of choosing the schema - he argues that it is still a task for a human. In his next post https://www.domainlanguage.com/articles/context-mapping-an-a... applying some output schemas would be more useful.

By they way you don' need to use library to have output schemas: https://platform.openai.com/docs/guides/structured-outputs , https://platform.claude.com/docs/en/build-with-claude/struct...

iLoveOncall•2w ago
All those do is retry the generation if it fails and then give up.

Calling them structured output is a lie, it's structured validation.

If the LLM persists in generating bad JSON, those are useless.

johndough•2w ago
To add to that, llama.cpp supports GBNF grammars, which allows generation of JSON, or even programming languages:

https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...

galaxyLogic•3w ago
I wonder if the same problem exists with AI based code-development more specifically.

To produce an application you should have some good unit-tests. So AI produces some unit-tests for us. Then we ask it again, and the unit-tests will be different. Where does that leave us? Can we be confident that the generated unit-tests are in some sense "correct"? How can they be correct if they are different every time we ask??

gulugawa•3w ago
This page has some advice: https://p-nand-q.com/programming/languages/java2k/
Legend2440•3w ago
The real issue here is that conventional software fundamentally lacks the expressiveness to process the kind of data that LLMs can.

That’s why you’re using an LLM in the first place.

qayxc•2w ago
How does this relate to the need for deterministic and consistent output?
noreplydev•2w ago
this page speaks about the same with some different approach https://selfvm.run/research