What do you call the fallacy where the universe is imperfect, therefore nobody can have higher standards for anything?
Mankind has spent literal centuries observing deficiencies and faults in human bookkeeping and calculation, constantly trying to improve it with processes and machinery. There's no good reason to suddenly stop caring about those issues simply because the latest proposal is marketed as "AI".
raffisk•2mo ago
Caveat: Measures reproducibility (edit distance), not full accuracy—determinism is necessary for compliance but needs semantic checks (e.g., embeddings to ground truth). Includes harness, invariants (±5%), and attestation.
Thoughts on inverse size-reliability? Planning follow-up with accuracy metrics vs. just repro.
colechristensen•2mo ago
Is this perhaps inference implementation details somehow introducing randomness?
kakugawa•2mo ago
https://news.ycombinator.com/item?id=45200925
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
> As it turns out, our request’s output does depend on the parallel user requests. Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass.
tl;dr: the way inference is batched introduces non-determinism.
doctorpangloss•2mo ago
Says who?
The stuff you comply with changes in real time. How’s that for determinism?
raffisk•2mo ago
Most groups I work with stick to traditional automation/rules systems, but top-down mandates are pushing them toward frontier models for general tasks—which then get plugged into these workflows. A lot stays in sandbox, but you'd be surprised what's already live in fin services.
The authorities I cited (FSB/BIS/CFTC) literally just said last month AI monitoring is "still at early stage" cc https://www.fsb.org/2024/11/the-financial-stability-implicat...
Curious how you'd tackle that real-time changing reg?
raffisk•2mo ago
This was the link I meant from Oct ‘25 reiterating early stages of AI monitoring
nomel•2mo ago
raffisk•2mo ago
ulrashida•2mo ago
That's not the way regulations work. Your compliance is measured against a fixed version of legislation.
raffisk•2mo ago
doctorpangloss•2mo ago
My bro, the tariffs. The first table of tariffs was written by ChatGPT!
> That's not the way regulations work.
Whatever regulations you are thinking of, they are myths now. I'm not saying deregulation - that isn't happening. In every industry - I know more about healthcare than finance - clear, complex, well specified regulations are being replaced by vague, mercurial ones. The SEC has changed many things too.
throwdbaaway•2mo ago
raffisk•2mo ago
Also the mistral medium model we tested had ~70% deterministic outputs across the 16 runs for the text to sql gen and summarization in json tasks- and it had reasoning on. Llama 3.3 70b started to degrade and doesn’t have reasoning. But it’s a relevant variable to consider