frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Reasoning models are risky. Anyone else experiencing this?

4•lebonnnn•14h ago
I'm building interviuu (a job application tool) and have been testing pretty much every LLM model out there for different parts of the product. One thing that's been driving me crazy: reasoning models seem particularly dangerous for business applications that need to go from A to B in a somewhat rigid way.

I wouldn't call it "deterministic output" because that's not really what LLMs do, but there are definitely use cases where you need a certain level of consistency and predictability, you know?

Here's what I keep running into with reasoning models:

During the reasoning process (and I know Anthropic has shown that what we read isn't the "real" reasoning happening), the LLM tends to ignore guardrails and specific instructions I've put in the prompt. The output becomes way more unpredictable than I need it to be.

Sure, I can define the format with JSON schemas (or objects) and that works fine. But the actual content? It's all over the place. Sometimes it follows my business rules perfectly, other times it just doesn't. And there's no clear pattern I can identify.

For example, I need the model to extract specific information from resumes and job posts, then match them according to pretty clear criteria. With regular models, I get consistent behavior most of the time. With reasoning models, it's like they get "creative" during their internal reasoning and decide my rules are more like suggestions.

I've tested almost all of them (from Gemini to DeepSeek) and honestly, none have convinced me for this type of structured business logic. They're incredible for complex problem-solving, but for "follow these specific steps and don't deviate" tasks? Not so much.

Anyone else dealing with this? Am I missing something in my prompting approach, or is this just the trade-off we make with reasoning models? I'm curious if others have found ways to make them more reliable for business applications.

What's been your experience with reasoning models in production?

Comments

techpineapple•14h ago
Don't have an answer to your exact question, but waxing philosophical for a moment, it's interesting that if you were talking to a person they would use their experience to maybe redirect you if they thought your intentions were wrong - subtly or overtly. It's probably really hard, in a casual way to get all the biases right in an LLM. When should it listen to you directly, and when should it say, no your wrong? There are probably a ton of use cases where you want the "wrong" answer, almost anything innovative is going to go against the grain of what a system would think is "right" by definition. Be 77% confident in your answer is hard to tune and a bad UX.
PaulHoule•14h ago
The difference is that in one case you care if the result is right and in another case you don't. If your "complex problem solving" was being used to "design a manufacturing process" or "diagnose a disease and come up with a treatment plan" or something that has consequences if it is wrong you're going to be unhappy.

On the other hand there is a lot of high social-status work that has no consequences like "write a college commencement address" and people will be impressed no matter what comes out.

Perhaps the likes of Sam Altman and Satya Nadella are so excited for AI because it can do their job but it can't fold towels or be the final assembly programmer for software that users actually use.

deepsiml•13h ago
How simple can you break down the process?

So, I did a dream analysis agent setup, and had to break every piece down. One focused on symbols, one focused on memory retrievals, one focused on emotions.

It meant I needed far more agents, but it was a much better result I believe.

Nothing's Untestable

https://antithesis.com/blog/2025/bugbash_2025/mitchell_hashimoto/
1•zdw•8m ago•0 comments

Show HN: I made a social media platform

https://onelined.tech/
1•sahil423•19m ago•0 comments

How to write Rust in the kernel part 1

https://lwn.net/Articles/1024202/
1•pkilgore•25m ago•0 comments

CEOs Start Saying the Quiet Part Out Loud: AI Will Wipe Out Jobs

https://www.wsj.com/tech/ai/ai-white-collar-job-loss-b9856259
1•planetjones•27m ago•1 comments

Debian on Apple M1/M2: status and call for testers

https://lists.debian.org/msgid-search/86037b55-e1b8-49e6-a0c9-f961b4ddc1a1@disroot.org
2•pabs3•27m ago•0 comments

GitHub Copilot coding agent now has a Playwright web browser

https://github.blog/changelog/2025-07-02-copilot-coding-agent-now-has-its-own-web-browser/
1•felineflock•29m ago•0 comments

Show HN: Piskvor Prime: a five-in-a-row iOS game with a reactive AI opponent

https://vojtahavlicek.github.io/vojtanyc/posts/piskvor_prime/
1•vh311•29m ago•0 comments

Show HN: Wyntk.ai – anti horseless carriage email

https://www.wyntk.ai/
1•gregorvand•29m ago•0 comments

Give Footnotes a Spec

https://nathansnelgrove.com/2025/07/give-footnotes-a-spec
1•OuterVale•34m ago•0 comments

Braess Paradox [video]

https://www.youtube.com/watch?v=-QTkPfq7w1A
1•travisgriggs•34m ago•1 comments

TPC-DS Benchmark: Trino 476, Spark 4.0.0, and Hive 4 on MR3 2.1

https://mr3docs.datamonad.com/blog/2025-07-02-performance-evaluation-2.1/
1•epdlxjmonad•36m ago•1 comments

Show HN: GenZ AI – Your Voice, but Fluent in Gen Z

https://twitter.com/MisbahSy/status/1940609386927521900
1•misbahsy•37m ago•0 comments

Ask HN: Building for Joy vs. Building for Scale

1•chbkall•38m ago•0 comments

OpenAI to Sponsor Driver Alex Palou at Mid-Ohio IndyCar Race

https://www.sportsbusinessjournal.com/Articles/2025/07/02/openai-gets-first-livery-position-with-ganassi-at-mid-ohio-as-ai-leader-looks-to-racing-for-insights/
1•tekdude•41m ago•0 comments

Learning F# with Falco: Response Localization

https://rewiring.bearblog.dev/learning-f-with-falco-response-localization/
1•Mossy9•43m ago•0 comments

Why the superyachts are getting bigger and bigger

https://www.bbc.com/news/articles/cvgnwx0lwwdo
1•andsoitis•45m ago•1 comments

Aphrodisiac

https://www.rxjourney.net/the-ultimate-aphrodisiac
1•chidieberechigo•46m ago•0 comments

Natasha Lyonne reveals David Lynch was a supporter of AI

https://faroutmagazine.co.uk/natasha-lyonne-reveals-david-lynch-supporter-ai/
2•CharlesW•51m ago•0 comments

Accelerate Legacy Application Modernization 4 times faster

https://www.techolution.com/products/appmod-ai-for-enterprises/
1•tech28•52m ago•0 comments

Third Interstellar Object Discovered

https://minorplanetcenter.net/mpec/K25/K25N12.html
2•gammarator•53m ago•0 comments

David Romero's Digital Models of Frank Lloyd Wright's Unrealized Buildings

https://www.thisiscolossal.com/2025/06/david-romero-frank-lloyd-wright/
2•CharlesW•53m ago•0 comments

You People Keep Contradicting Yourselves

https://www.taylor.gl/blog/27
1•taylorlunt•57m ago•0 comments

Windows 11 Start menu uses a 15 MB JSON for categories

https://www.windowslatest.com/2025/07/03/windows-11-start-menu-uses-a-15mb-json-not-ai-to-organize-apps-under-categories/
3•lcnmrn•1h ago•2 comments

2025 AsiaLLVM Developers' Meeting Talks

https://www.youtube.com/playlist?list=PL_R5A0lGi1ADKfJbzpA0rMDCb5T3QGe5k
1•matt_d•1h ago•1 comments

Open Co-Scientist Agents: Recreating Google's AI Co-Scientist in LangGraph

https://github.com/conradry/open-coscientist-agents
1•conradry•1h ago•0 comments

The Mechanic Johnny Cash and Elvis Would've Wanted (Toolbox Tour) [video]

https://www.youtube.com/watch?v=xrHtzSIh2GQ
1•meandave•1h ago•0 comments

What happens to your brain when you watch videos online at faster speeds

https://theconversation.com/what-happens-to-your-brain-when-you-watch-videos-online-at-faster-speeds-than-normal-259930
1•Duanemclemore•1h ago•2 comments

Is that a Lululemon Scuba hoodie or Costco dupe? No one has to know

https://www.washingtonpost.com/style/fashion/2025/01/25/costco-dupe-lululemon-scuba-hoodie-danskin/
2•walterbell•1h ago•0 comments

Has Xbox Considered Laying One Person Off Instead of Thousands

https://aftermath.site/xbox-layoffs-microsoft-phil-spencer
6•Narishma•1h ago•1 comments

Mr. Abrego's Account of Torture at CECOT in El Salvador

https://www.muellershewrote.com/p/mr-abregos-account-of-torture-at
8•tastyface•1h ago•1 comments