frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLM Output Drift in Financial Workflows: Validation and Mitigation (arXiv)

https://arxiv.org/abs/2511.07585
16•raffisk•2h ago

Comments

raffisk•2h ago
Empirical study on LLM output consistency in regulated financial tasks (RAG, JSON, SQL). Governance focus: Smaller models (Qwen2.5-7B, Granite-3-8B) hit 100% determinism at T=0.0, passing audits (FSB/BIS/CFTC), vs. larger like GPT-OSS-120B at 12.5%. Gaps are huge (87.5%, p<0.0001, n=16) and survive multiple-testing corrections.

Caveat: Measures reproducibility (edit distance), not full accuracy—determinism is necessary for compliance but needs semantic checks (e.g., embeddings to ground truth). Includes harness, invariants (±5%), and attestation.

Thoughts on inverse size-reliability? Planning follow-up with accuracy metrics vs. just repro.

colechristensen•1h ago
Outputs not being deterministic with temperature = 0 doesn't match my understanding of what "temperature" meant, I thought the definition of T=0 was determinism.

Is this perhaps inference implementation details somehow introducing randomness?

kakugawa•50m ago
Defeating Nondeterminism in LLM Inference

https://news.ycombinator.com/item?id=45200925

https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

> As it turns out, our request’s output does depend on the parallel user requests. Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass.

tl;dr: the way inference is batched introduces non-determinism.

doctorpangloss•48m ago
“Determinism is necessary for compliance”

Says who?

The stuff you comply with changes in real time. How’s that for determinism?

raffisk•8m ago
Author here—fair point, regs are a moving target . But FSB/BIS/CFTC explicitly require reproducible outputs for audits (no random drift in financial reports). Determinism = traceability, even when rules update at the very least

Most groups I work with stick to traditional automation/rules systems, but top-down mandates are pushing them toward frontier models for general tasks—which then get plugged into these workflows. A lot stays in sandbox, but you'd be surprised what's already live in fin services.

The authorities I cited (FSB/BIS/CFTC) literally just said last month AI monitoring is "still at early stage" cc https://www.fsb.org/2024/11/the-financial-stability-implicat...

Curious how you'd tackle that real-time changing reg?

throwdbaaway•20m ago
It is the reasoning. During the reasoning process, the top few tokens have very similar or even same logprobs. With gpt-oss-120b, you should be able to get deterministic output by turning off reasoning, e.g. by appending:

    {"role": "assistant", "content": "<think></think>"}
Of course, the model will be less capable without reasoning.
measurablefunc•1h ago
This is b/c these things are Markov chains. You can not expect consistent results & outputs.
SrslyJosh•31m ago
Using an LLM for a "financial workflow" makes as much sense as integrating one with Excel. But who needs correct results when you're just working with money, right? ¯\_(ツ)_/¯
mirekrusin•28m ago
Humans are non deterministic yet they use excel, work with financial workflows and deal with the money.
ACCount37•30m ago
Did you actually read what the paper was about before leaving a low quality comment?

The last-ever penny will be minted today in Philadelphia

https://www.cnn.com/2025/11/12/business/last-penny-minted
405•andrewl•5h ago•557 comments

Steam Machine

https://store.steampowered.com/sale/steammachine
941•davikr•3h ago•458 comments

Project Euler

https://projecteuler.net
196•swatson741•4h ago•45 comments

Steam Frame

https://store.steampowered.com/sale/steamframe
700•Philpax•4h ago•233 comments

Yt-dlp: External JavaScript runtime now required for full YouTube support

https://github.com/yt-dlp/yt-dlp/issues/15012
771•bertman•11h ago•475 comments

Launch HN: JSX Tool (YC F25) – A Browser Dev-Panel IDE for React

52•jsunderland323•4h ago•45 comments

Blasting Yeast with UV Light

https://chillphysicsenjoyer.substack.com/p/results-from-blasting-yeast-with
31•Gormisdomai•3h ago•3 comments

OmniAI (YC W24) Is Hiring Forward Deployed Engineers

https://www.ycombinator.com/companies/omniai/jobs/fuTMf2w-forward-deployed-engineer
1•themanmaran•56m ago

Learn Prolog Now

https://lpn.swi-prolog.org/lpnpage.php?pageid=top
218•rramadass•7h ago•135 comments

A brief look at FreeBSD

https://yorickpeterse.com/articles/a-brief-look-at-freebsd/
74•todsacerdoti•9h ago•26 comments

Making the Clang AST Leaner and Faster

https://cppalliance.org/mizvekov,/clang/2025/10/20/Making-Clang-AST-Leaner-Faster.html
9•vitaut•1h ago•1 comments

Ioannis Yannas invented artificial skin for treatment of burns–dies at 90

https://news.mit.edu/2025/professor-ioannis-yannas-dies-1027
103•bookofjoe•1w ago•9 comments

Anthropic invests $50B in US AI infrastructure

https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure
71•asciimike•6h ago•45 comments

.NET 10

https://devblogs.microsoft.com/dotnet/announcing-dotnet-10/
457•runesoerensen•1d ago•394 comments

Fighting the New York Times' invasion of user privacy

https://openai.com/index/fighting-nyt-user-privacy-invasion
214•meetpateltech•7h ago•213 comments

Valve Announces New Steam Machine, Steam Controller and Steam Frame

https://www.phoronix.com/news/Steam-Machines-Frame-2026
189•doener•3h ago•27 comments

How Tube Amplifiers Work

https://robrobinette.com/How_Amps_Work.htm
34•gokhan•3h ago•23 comments

Plumbing vs. Internet, Revisited

https://gwern.net/blog/2025/plumbing-vs-internet
42•Ariarule•1w ago•13 comments

Async and Finaliser Deadlocks

https://tratt.net/laurie/blog/2025/async_and_finaliser_deadlocks.html
44•emailed•3h ago•15 comments

LLM Output Drift in Financial Workflows: Validation and Mitigation (arXiv)

https://arxiv.org/abs/2511.07585
16•raffisk•2h ago•10 comments

Software Development in the Time of New Angels

https://davegriffith.substack.com/p/software-development-in-the-time
14•calosa•1w ago•8 comments

What happened to Transmeta, the last big dotcom IPO

https://dfarq.homeip.net/what-happened-to-transmeta-the-last-big-dotcom-ipo/
193•onename•12h ago•107 comments

Yann LeCun to depart Meta and launch AI startup focused on 'world models'

https://www.nasdaq.com/articles/metas-chief-ai-scientist-yann-lecun-depart-and-launch-ai-start-fo...
782•MindBreaker2605•14h ago•596 comments

Waymo robotaxis are now giving rides on freeways in LA, SF and Phoenix

https://techcrunch.com/2025/11/12/waymo-robotaxis-are-now-giving-rides-on-freeways-in-these-3-cit...
259•nharada•5h ago•299 comments

Maestro Technology Sells Used SSD Drives as New

https://kozubik.com/items/MaestroTechnology/
136•walterbell•3h ago•55 comments

Micro.blog launches new 'Studio' tier with video hosting

https://heydingus.net/blog/2025/11/micro-blog-offers-an-indie-alternative-to-youtube-with-its-stu...
96•justin-reeves•8h ago•31 comments

NetHack4 Philosophy

http://nethack4.org/philosophy.html
61•suioir•1w ago•26 comments

Building a CI/CD Pipeline Runner from Scratch in Python

https://muhammadraza.me/2025/building-cicd-pipeline-runner-python/
29•mr_o47•3d ago•6 comments

Hard drives on backorder for two years as AI data centers trigger HDD shortage

https://www.tomshardware.com/pc-components/hdds/ai-triggers-hard-drive-shortage-amidst-dram-squee...
113•pabs3•16h ago•96 comments

Show HN: Cancer diagnosis makes for an interesting RL environment for LLMs

31•dchu17•4h ago•11 comments