frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

How to wrangle non-deterministic AI outputs into conventional software? (2025)

https://www.domainlanguage.com/articles/ai-components-deterministic-system/
11•druther•13h ago

Comments

ironbound•13h ago
[flagged]
dang•1h ago
"Don't be snarky."

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

ramity•38m ago
Let me first start off by saying I and many others have stepped in this pitfall. This is not an attack, but a good faith attempt to share painfully acquired knowledge. I'm actively using AI tooling, and this comment isn't a slight on the tooling but rather how we're all seemingly putting the circle in the square hole and it fits.

Querying an LLM to output its confidence in its output is a misguided pattern despite being commonly applied by many. LLMs are not good at classification tasks as the author states. They can "do" it, yes. Perhaps better than random sampling can, but random sampling can "do" it as well. Don't get too tied to that example. The idea here is that if you are okay with something getting the answer wrong every so often, LLMs might be your solve, but this is a post about conforming non-deterministic AI into classical systems. Are you okay if your agentic agent picks the red tool instead of the blue tool 1%, 10%, etc of the time? If so, you're never not going to be wrangling, and that's the reality often left unspoken when integrating these tools.

While tangential to this article, I believe its worth stating that when interacting with an LLM in any capacity, remember your own cognitive biases. You often want the response to work, and while generated responses may look good and fit your mental model, it requires increasingly obscene levels of critical evaluation to see through the fluff.

For some, there will be inevitable dissonance reading this, but consider that these experiments are local examples. Its lack of robustness will become apparent with large scale testing. The data spaces these models have been trained on are unfathomably large in both quantity and depth, but under/over sampling bias will be ever present (just to name one).

Consider the the following thought experiment: You are an applicant for a job submitting your resume with knowledge it will be fed into an LLM. Let's confine your goal into something very simple. Make it say something. Let's oversimplify for the sake of the example and say complete words are tokens. Consider "collocations". [Bated] breath, [batten] down, [diametrically] opposed, [inclement] weather, [hermetically] sealed. Extend this to contexts. [Oligarchy] government, [Chromosome] biology, [Paradigm] technology, [Decimate] to kill. With this in mind, consider how each word of your resume "steers" the model's subsequent response, and consider how the data each model is trained on can subtly influence its response.

Now let's bring it home and tie the thought experiment into confidence scoring in responses. Let's say its reasonable to assume that the results of low accuracy/low confidence models are less commonly found on the internet than higher performing ones. If that can be entertained, extend the argument to confidence responses. Maybe the term "JSON" or any other term used in the model input is associated with high confidences.

Alright, wrapping it up. The end point here is that the model output provided confidence value is not the likelihood of the answer provided in the response but rather the most likely value following the stream of tokens in the combined input and output. The real sampled confidence values exist closer to code, but they are limited to each token. Not series of tokens.

Cloudflare acquires Astro

https://astro.build/blog/joining-cloudflare/
523•todotask2•5h ago•274 comments

6-Day and IP Address Certificates Are Generally Available

https://letsencrypt.org/2026/01/15/6day-and-ip-general-availability
221•jaas•4h ago•129 comments

Michelangelo's first painting, created when he was 12 or 13

https://www.openculture.com/2026/01/discover-michelangelos-first-painting.html
216•bookofjoe•6h ago•127 comments

Just the Browser

https://justthebrowser.com/
389•cl3misch•8h ago•211 comments

STFU

https://github.com/Pankajtanwarbanna/stfu
369•tanelpoder•2h ago•255 comments

Cursor's latest "browser experiment" implied success without evidence

https://embedding-shapes.github.io/cursor-implied-success-without-evidence/
164•embedding-shape•5h ago•74 comments

Lock-Picking Robot

https://github.com/etinaude/Lock-Picking-Robot
180•p44v9n•4d ago•80 comments

Launch HN: Indy (YC S21) – A support app designed for ADHD brains

https://www.shimmer.care/indy-redirect
44•christalwang•3h ago•50 comments

Elasticsearch Was Never a Database

https://www.paradedb.com/blog/elasticsearch-was-never-a-database
37•jamesgresql•4d ago•38 comments

Read_once(), Write_once(), but Not for Rust

https://lwn.net/SubscriberLink/1053142/8ec93e58d5d3cc06/
82•todsacerdoti•5h ago•23 comments

Dev-owned testing: Why it fails in practice and succeeds in theory

https://dl.acm.org/doi/10.1145/3780063.3780066
69•rbanffy•6h ago•89 comments

Zep AI (Agent Context Engineering, YC W24) Is Hiring Forward Deployed Engineers

https://www.ycombinator.com/companies/zep-ai/jobs/
1•roseway4•3h ago

Earth from Space: The Fate of a Giant

https://www.esa.int/ESA_Multimedia/Images/2026/01/Earth_from_Space_The_fate_of_a_giant
12•geox•1h ago•2 comments

Dell UltraSharp 52 Thunderbolt Hub Monitor

https://www.dell.com/en-us/shop/dell-ultrasharp-52-thunderbolt-hub-monitor-u5226kw/apd/210-bthw/m...
81•cebert•2h ago•90 comments

Show HN: 1Code – Open-source Cursor-like UI for Claude Code

https://github.com/21st-dev/1code
24•Bunas•1d ago•15 comments

Emoji Use in the Electronic Health Record is Increasing

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2843883
15•giuliomagnifico•2h ago•1 comments

Why DuckDB is my first choice for data processing

https://www.robinlinacre.com/recommend_duckdb/
116•tosh•9h ago•48 comments

OpenBSD-current now runs as guest under Apple Hypervisor

https://www.undeadly.org/cgi?action=article;sid=20260115203619
380•gpi•17h ago•51 comments

The Alignment Game

https://dmvaldman.github.io/alignment-game/
12•dmvaldman•20h ago•1 comments

Training my smartwatch to track intelligence

https://dmvaldman.github.io/rooklift/
113•dmvaldman•1d ago•51 comments

Feature Selection: A Primer

https://ikromshi.com/2025/12/30/feature-selection-primer.html
8•ikromshi•4d ago•0 comments

List of individual trees

https://en.wikipedia.org/wiki/List_of_individual_trees
320•wilson090•20h ago•102 comments

psc: The ps utility, with an eBPF twist and container context

https://github.com/loresuso/psc
57•tanelpoder•6h ago•22 comments

Interactive eBPF

https://ebpf.party/
182•samuel246•12h ago•8 comments

Zorgdomein Integration: A Guide to Secure .NET and Azure Architecture

https://plakhlani.in/healthcare/bidirectional-patient-data-exchange-with-zorgdomein/
12•prashantl•4d ago•8 comments

How to wrangle non-deterministic AI outputs into conventional software? (2025)

https://www.domainlanguage.com/articles/ai-components-deterministic-system/
11•druther•13h ago•3 comments

Can You Disable Spotlight and Siri in macOS Tahoe?

https://eclecticlight.co/2026/01/16/can-you-disable-spotlight-and-siri-in-macos-tahoe/
76•chmaynard•5h ago•61 comments

Our approach to advertising and expanding access to ChatGPT

https://openai.com/index/our-approach-to-advertising-and-expanding-access/
92•rvz•2h ago•65 comments

The spectrum of isolation: From bare metal to WebAssembly

https://buildsoftwaresystems.com/post/guide-to-execution-environments/
85•ThierryBuilds•10h ago•30 comments

Canada slashes 100% tariffs on Chinese EVs to 6%

https://electrek.co/2026/01/16/canada-breaks-with-us-slashes-100-tariffs-chinese-evs/
319•1970-01-01•3h ago•385 comments