frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: I built a clawdbot that texts like your crush

https://14.israelfirew.co
1•IsruAlpha•55s ago•0 comments

Scientists reverse Alzheimer's in mice and restore memory (2025)

https://www.sciencedaily.com/releases/2025/12/251224032354.htm
1•walterbell•3m ago•0 comments

Compiling Prolog to Forth [pdf]

https://vfxforth.com/flag/jfar/vol4/no4/article4.pdf
1•todsacerdoti•5m ago•0 comments

Show HN: Cymatica – an experimental, meditative audiovisual app

https://apps.apple.com/us/app/cymatica-sounds-visualizer/id6748863721
1•_august•6m ago•0 comments

GitBlack: Tracing America's Foundation

https://gitblack.vercel.app/
1•martialg•6m ago•0 comments

Horizon-LM: A RAM-Centric Architecture for LLM Training

https://arxiv.org/abs/2602.04816
1•chrsw•7m ago•0 comments

We just ordered shawarma and fries from Cursor [video]

https://www.youtube.com/shorts/WALQOiugbWc
1•jeffreyjin•8m ago•1 comments

Correctio

https://rhetoric.byu.edu/Figures/C/correctio.htm
1•grantpitt•8m ago•0 comments

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist
1•crescit_eundo•12m ago•0 comments

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

https://www.twz.com/air/watch-ukraines-minigun-firing-drone-hunting-turboprop-in-action
1•breve•13m ago•0 comments

Free Trial: AI Interviewer

https://ai-interviewer.nuvoice.ai/
1•sijain2•13m ago•0 comments

FDA Intends to Take Action Against Non-FDA-Approved GLP-1 Drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
8•randycupertino•14m ago•2 comments

Supernote e-ink devices for writing like paper

https://supernote.eu/choose-your-product/
3•janandonly•16m ago•0 comments

We are QA Engineers now

https://serce.me/posts/2026-02-05-we-are-qa-engineers-now
1•SerCe•17m ago•0 comments

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

https://arxiv.org/abs/2602.01465
2•NBenkovich•17m ago•0 comments

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

https://www.latent.space/p/adversarial-reasoning
1•swyx•17m ago•0 comments

Show HN: Poddley.com – Follow people, not podcasts

https://poddley.com/guests/ana-kasparian/episodes
1•onesandofgrain•25m ago•0 comments

Layoffs Surge 118% in January – The Highest Since 2009

https://www.cnbc.com/2026/02/05/layoff-and-hiring-announcements-hit-their-worst-january-levels-si...
8•karakoram•25m ago•0 comments

Papyrus 114: Homer's Iliad

https://p114.homemade.systems/
1•mwenge•26m ago•1 comments

DicePit – Real-time multiplayer Knucklebones in the browser

https://dicepit.pages.dev/
1•r1z4•26m ago•1 comments

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
2•PaulHoule•27m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay
2•dshearer•29m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/
1•todsacerdoti•29m ago•0 comments

Achieving Ultra-Fast AI Chat Widgets

https://www.cjroth.com/blog/2026-02-06-chat-widgets
1•thoughtfulchris•31m ago•0 comments

Show HN: Runtime Fence – Kill switch for AI agents

https://github.com/RunTimeAdmin/ai-agent-killswitch
1•ccie14019•33m ago•1 comments

Researchers surprised by the brain benefits of cannabis usage in adults over 40

https://nypost.com/2026/02/07/health/cannabis-may-benefit-aging-brains-study-finds/
2•SirLJ•35m ago•0 comments

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

https://fortune.com/2026/02/04/peter-thiel-antichrist-greta-thunberg-end-of-modernity-billionaires/
4•randycupertino•36m ago•2 comments

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing
3•breve•41m ago•0 comments

Show HN: Animated beach scene, made with CSS

https://ahmed-machine.github.io/beach-scene/
1•ahmedoo•42m ago•0 comments

An update on unredacting select Epstein files – DBC12.pdf liberated

https://neosmart.net/blog/efta00400459-has-been-cracked-dbc12-pdf-liberated/
3•ks2048•42m ago•0 comments
Open in hackernews

Reasoning models are risky. Anyone else experiencing this?

4•lebonnnn•7mo ago
I'm building interviuu (a job application tool) and have been testing pretty much every LLM model out there for different parts of the product. One thing that's been driving me crazy: reasoning models seem particularly dangerous for business applications that need to go from A to B in a somewhat rigid way.

I wouldn't call it "deterministic output" because that's not really what LLMs do, but there are definitely use cases where you need a certain level of consistency and predictability, you know?

Here's what I keep running into with reasoning models:

During the reasoning process (and I know Anthropic has shown that what we read isn't the "real" reasoning happening), the LLM tends to ignore guardrails and specific instructions I've put in the prompt. The output becomes way more unpredictable than I need it to be.

Sure, I can define the format with JSON schemas (or objects) and that works fine. But the actual content? It's all over the place. Sometimes it follows my business rules perfectly, other times it just doesn't. And there's no clear pattern I can identify.

For example, I need the model to extract specific information from resumes and job posts, then match them according to pretty clear criteria. With regular models, I get consistent behavior most of the time. With reasoning models, it's like they get "creative" during their internal reasoning and decide my rules are more like suggestions.

I've tested almost all of them (from Gemini to DeepSeek) and honestly, none have convinced me for this type of structured business logic. They're incredible for complex problem-solving, but for "follow these specific steps and don't deviate" tasks? Not so much.

Anyone else dealing with this? Am I missing something in my prompting approach, or is this just the trade-off we make with reasoning models? I'm curious if others have found ways to make them more reliable for business applications.

What's been your experience with reasoning models in production?

Comments

techpineapple•7mo ago
Don't have an answer to your exact question, but waxing philosophical for a moment, it's interesting that if you were talking to a person they would use their experience to maybe redirect you if they thought your intentions were wrong - subtly or overtly. It's probably really hard, in a casual way to get all the biases right in an LLM. When should it listen to you directly, and when should it say, no your wrong? There are probably a ton of use cases where you want the "wrong" answer, almost anything innovative is going to go against the grain of what a system would think is "right" by definition. Be 77% confident in your answer is hard to tune and a bad UX.
PaulHoule•7mo ago
The difference is that in one case you care if the result is right and in another case you don't. If your "complex problem solving" was being used to "design a manufacturing process" or "diagnose a disease and come up with a treatment plan" or something that has consequences if it is wrong you're going to be unhappy.

On the other hand there is a lot of high social-status work that has no consequences like "write a college commencement address" and people will be impressed no matter what comes out.

Perhaps the likes of Sam Altman and Satya Nadella are so excited for AI because it can do their job but it can't fold towels or be the final assembly programmer for software that users actually use.

deepsiml•7mo ago
How simple can you break down the process?

So, I did a dream analysis agent setup, and had to break every piece down. One focused on symbols, one focused on memory retrievals, one focused on emotions.

It meant I needed far more agents, but it was a much better result I believe.