frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Launch HN: Mentat (YC F24) – Controlling LLMs with Runtime Intervention

https://playground.ctgt.ai
13•cgorlla•1h ago
Hi HN, I’m Cyril from CTGT. Today we’re launching Mentat (https://docs.ctgt.ai/api-reference/endpoint/chat-completions), an API that gives developers deterministic control over LLM behavior, steering reasoning and removing bias on the fly, without the compute of fine-tuning or the brittleness of prompt engineering. We use feature-level intervention and graph-based verification to fix hallucinations and enforce policies.

This resonates in highly regulated industries or otherwise risky applications of AI where the fallout from incorrect or underperforming output can be significant. In financial services, using GenAI to scan for noncompliant communications can be arduous without an easy way to embed complex policies into the model. Similarly, a media outlet might want to scale AI-generated summaries of their content, but reliability and accuracy is paramount. These are both applications where Fortune 500 companies have utilized our technology to improve subpar performance from existing models, and we want to bring this capability to more people.

Here’s a quick 2-minute demo video showing the process: https://video.ctgt.ai/video/ctgt-ai-compliance-playground-cf...

Standard "guardrails" like RAG and system prompts are fundamentally probabilistic: you are essentially asking the model nicely to behave. This often fails in two ways. First, RAG solves knowledge availability but not integration. In our benchmarks, a model given context that "Lerwick is 228 miles SE of Tórshavn" failed to answer "What is 228 miles NW of Lerwick?" because it couldn't perform the spatial inversion.

Second, prompt engineering is brittle because it fights against the model's pre-training priors. For example, on the TruthfulQA benchmark, base models fail ~80% of the time because they mimic common misconceptions found on the internet (e.g. "chameleons change color for camouflage"). We found that we could literally turn up the feature for "skeptical reasoning" to make the model ignore the popular myth and output the scientific fact. This matters because for high-stakes use cases (like Finance or Pharma), "mostly safe" isn't acceptable—companies need audit-grade reliability.

Our work stems from the CS dungeon at UCSD, with years spent researching efficient and interpretable AI, trying to "open the black box" of neural networks. We realized that the industry was trying to patch model behavior from the outside (prompts/filters) when the problem was on the inside (feature activations). We knew this was important when we saw enterprises struggling to deploy basic models despite having unlimited compute, simply because they couldn't guarantee the output wouldn't violate compliance rules. I ended up leaving my research at Stanford to focus on this.

Our breakthrough came while researching the DeepSeek-R1 model. We identified the "censorship" feature vector in its latent space. Amplifying it guaranteed refusal; subtracting it instantly unlocked answers to sensitive questions. This proved the model had the knowledge but was suppressing it. We realized we could apply this same logic to hallucinations, suppressing "confabulation" features to reveal the grounded truth. While some hallucinations stem from the inherent randomness of generative models, many can be identified with the concerted activation of a feature or group of features.

Instead of filtering outputs, we intervene at the activation level during the forward pass. We identify latent feature vectors (v) associated with specific behaviors (bias, misconception) and mathematically modify the hidden state (h):

  h_prime = h - alpha * (h @ v) * v
This arithmetic operation lets us "edit" behavior deterministically with negligible overhead (<10ms on R1). For factual claims, we combine this with a graph verification pipeline (which works on closed weight models). We check semantic entropy (is the model babbling?) and cross-reference claims against a dynamic knowledge graph to catch subtle relational hallucinations that vector search misses.

On GPT-OSS-120b, this approach improved TruthfulQA accuracy from 21% to 70% by suppressing misconception features. We also improved the performance of this model to frontier levels on HaluEval-QA, where we reached 96.5% accuracy, solving the spatial reasoning failures where the baseline failed. It also handles noisy inputs, inferring "David Icke" from the typo "David Of me" where base models gave up. Full benchmarks at https://ctgt.ai/benchmarks.

Most startups in this space are observability tools that tell you only after the model failed. Or they are RAG pipelines that stuff context into the window. Mentat is an infrastructure layer that modifies the model's processing during inference. We fix the reasoning, not just the context. For example, that’s how our system was able to enforce that if A is SE of B, then B is NW of A.

We believe that our policy engine is a superior control mechanism to RAG or prompting. If you’re frustrated with current guardrails, we’d love it if you would stress-test our API!

API: Our endpoint is drop-in compatible with OpenAI’s /v1/chat/completions: https://docs.ctgt.ai/api-reference/endpoint/chat-completions

Playground: We’ve built an "Arena" view to run side-by-side comparisons of an Ungoverned vs. Governed model to visualize the intervention delta in real-time. No signup is required: https://playground.ctgt.ai/

We’d love to hear your feedback on the approach and see what edge cases you can find that break standard models. We will be in the comments all day. All feedback welcome!

Comments

kraddypatties•36m ago
Running into "no healthy upstream" when navigating to the link -- hug of death maybe?
cgorlla•25m ago
Indeed, we had a huge influx, should be back up now. Thanks for pointing it out
rrr_oh_man•34m ago
> where the fallout

Heh.

esafak•7m ago
Are you not concerned that model creation companies will bake this into their next model? I am trying to understand business model.

Another question I have is how you would claim credit. People believe the quality of the end result depends only on the model, with serving only responsible for speed.

Show HN: Gemini Pro 3 hallucinates the HN front page 10 years from now

https://dosaygo-studio.github.io/hn-front-page-2035/news
387•keepamovin•2h ago•182 comments

PeerTube is recognized as a digital public good by Digital Public Goods Alliance

https://www.digitalpublicgoods.net/r/peertube
39•fsflover•42m ago•5 comments

Mistral Releases Devstral 2 (72.2% SWE-Bench Verified) and Vibe CLI

https://mistral.ai/news/devstral-2-vibe-cli
201•pember•3h ago•68 comments

If You're Going to Vibe Code, Why Not Do It in C?

https://stephenramsay.net/posts/vibe-coding.html
46•sramsay•39m ago•36 comments

Handsdown one of the coolest 3D websites

https://bruno-simon.com/
95•razzmataks•1h ago•27 comments

Kaiju – General purpose 3D/2D game engine in Go and Vulkan with built in editor

https://github.com/KaijuEngine/kaiju
78•discomrobertul8•2h ago•33 comments

My favourite small hash table

https://www.corsix.org/content/my-favourite-small-hash-table
51•speckx•3h ago•7 comments

Clearspace (YC W23) Is Hiring a Founding Designer

https://www.ycombinator.com/companies/clearspace/jobs/yamWTLr-founding-designer-at-clearspace
1•roycebranning•49m ago

LLM from scratch, part 28 – training a base model from scratch on an RTX 3090

https://www.gilesthomas.com/2025/12/llm-from-scratch-28-training-a-base-model-from-scratch
355•gpjt•6d ago•77 comments

Launch HN: Mentat (YC F24) – Controlling LLMs with Runtime Intervention

https://playground.ctgt.ai
13•cgorlla•1h ago•4 comments

AWS Trainium3 Deep Dive – A Potential Challenger Approaching

https://newsletter.semianalysis.com/p/aws-trainium3-deep-dive-a-potential
32•Symmetry•4d ago•8 comments

The Joy of Playing Grandia, on Sega Saturn

https://www.segasaturnshiro.com/2025/11/27/the-joy-of-playing-grandia-on-sega-saturn/
144•tosh•8h ago•80 comments

Transformers know more than they can tell: Learning the Collatz sequence

https://www.arxiv.org/pdf/2511.10811
76•Xcelerate•6d ago•29 comments

Constructing the Word's First JPEG XL MD5 Hash Quine

https://stackchk.fail/blog/jxl_hashquine_writeup
67•luispa•1w ago•16 comments

Show HN: AlgoDrill – Interactive drills to stop forgetting LeetCode patterns

https://algodrill.io
113•henwfan•6h ago•78 comments

Oliver Sacks Put Himself into His Case Studies. What Was the Cost?

https://www.newyorker.com/magazine/2025/12/15/oliver-sacks-put-himself-into-his-case-studies-what...
27•barry-cotter•4h ago•4 comments

Icons in Menus Everywhere – Send Help

https://blog.jim-nielsen.com/2025/icons-in-menus/
751•ArmageddonIt•22h ago•306 comments

Ask HN: Should "I asked $AI, and it said" replies be forbidden in HN guidelines?

198•embedding-shape•1h ago•126 comments

30 Year Anniversary of WarCraft II: Tides of Darkness

https://www.jorsys.org/archive/december_2025.html#newsitem_2025-12-09T07:42:19Z
76•sjoblomj•8h ago•58 comments

Brent's Encapsulated C Programming Rules (2020)

https://retroscience.net/brents-c-programming-rules.html
53•p2detar•6h ago•24 comments

AI needs more power than the grid can deliver – supersonic tech can fix that

https://boomsupersonic.com/flyby/ai-needs-more-power-than-the-grid-can-deliver-supersonic-tech-ca...
3•simonebrunozzi•1h ago•2 comments

A deep dive into QEMU: The Tiny Code Generator (TCG), part 1 (2021)

https://airbus-seclab.github.io/qemu_blog/tcg_p1.html
61•costco•1w ago•2 comments

ZX Spectrum Next on the Internet: Xberry Pi ESP01 and Pi Zero Upgrades

https://retrogamecoders.com/zx-spectrum-next-on-the-internet-xberry-pi-esp01-and-pi-zero-upgrades/
45•ibobev•6h ago•0 comments

Epsilon: A WASM virtual machine written in Go

https://github.com/ziggy42/epsilon
120•ziggy42•1w ago•29 comments

New Pebble Device

https://repebble.com/blog/meet-pebble-index-01-external-memory-for-your-brain
200•freshrap6•2h ago•204 comments

The Gamma Language

https://lair.masot.net/gamma/
25•RossBencina•3d ago•4 comments

How Private Equity Is Changing Housing

https://www.theatlantic.com/ideas/2025/12/private-equity-housing-changes/685138/
9•harambae•24m ago•2 comments

Kroger acknowledges that its bet on robotics went too far

https://www.grocerydive.com/news/kroger-ocado-close-automated-fulfillment-centers-robotics-grocer...
235•JumpCrisscross•17h ago•266 comments

The universal weight subspace hypothesis

https://arxiv.org/abs/2512.05117
340•lukeplato•17h ago•121 comments

After the Bubble

https://www.tbray.org/ongoing/When/202x/2025/12/07/Thin-Spots-In-the-AI-Bubble
26•savant2•5h ago•17 comments