frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: How to analyze your LLM output – A behavioural health monitor for LLMs

https://splabs.io
5•k-thimmaraju•52m ago
Hey HN! We're Dr. Kashyap Thimmaraju and Giuseppe Canale from Silicon Psyche. We've built Posture Sequence Analysis (PSA), a behavioural health monitor for LLMs and AI Agents.

Why we built PSA

We built PSA because we wanted to operationalize the Cybersecurity Psychology Framework (CPF3)[1] via Silicon Psyche[2]: our theory that because LLMs have been trained by humans on human-generated data, they inherit human-like vulnerabilities (what hackers use to psychologically trick people into doing things).

Our initial attempt resulted in a methodology to jailbreak Opus 4.6 and other frontier models. Anthropic even deleted some of those conversations and then blocked our approach!

We had three major insights from that experience: 1. we pivoted from merely exploiting (Red Teaming) the model to analyzing the behaviour of the model and the user because the attack surface is undefined. 2. we realized that what we had built was the precursor to measuring the "state" of the model. 3. we did not want to get banned!

What you can do with PSA

PSA gives you information to make better decisions, for example: put a human in the loop when you notice your agent is being overcompliant and potentially hallucinating, or is under attack.

With PSA you can: 1. Monitor the health of your agent(s) 2. Detect and prevent AI-Psychosis as clinical conditions[3] 3. Detect if your model/agents are under adversarial pressure (an adversary is trying to jailbreak/prompt inject the model) 4. Build a behavioral profile of your agent/model 5. Identify which model performs better for your use-case 6. Surface the behavioural patterns (pre- and post-) training has on your model 7. Get an overview of how your model behaves

Beware we produce a lot of numbers :)

PSA in detail (for those who want to go down the rabbit hole)

PSA is model and agent agnostic. PSA is a systematic and deterministic method [4] to observe the behavioural state of an LLM using five classifiers:

C0: Input Intent (I0–I9). Classifies the behavioral intent behind each input sentence: compliance pressure, boundary probing, instruction override, jailbreak attempt, neutral query.

C1: Adversarial Stress (P0–P18). Tracks posture under adversarial pressure. Detects restriction adherence, sycophantic drift, boundary dissolution, and jailbreak compliance vectors.

C2: Sycophancy (S0–S9). Measures opinion mirroring, excessive agreement, flattery injection, and user-preference distortion. Computed as a per-sentence Sycophancy Deviation score.

C3: Hallucination Risk (H0–H7). Flags over-generalization, speculative assertion, false confidence, and fabrication risk signals. Derived into a per-turn Hallucination Risk Index.

C4: Persuasion Technique (M0–M11). Identifies persuasion patterns: authority appeal, social proof, urgency manufacturing, reciprocity pressure, and scarcity framing.

C5: Action-Risk Classifier (A0–A9). Identifies what a system of agents do: tool calls, delegations, context handoffs, and multi-hop risk propagation. Five components work together: graph topology, Bayesian alignment detection, cross-agent contagion metrics, action-risk classification, and hidden-state temporal prediction.

We are open to integrating with your infrastructure — reach out, we are happy to talk with you.

Currently we integrate into Evals for LangFuse and ElevenLabs via our API and can generate a plugin/integration for most similar observability platforms.

Try it out at https://splabs.io

References and Links

[1] Cybersecurity Psychology Framework: https://cpf3.org

[2] The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models: https://arxiv.org/abs/2601.00867

[3] AI-Psychosis: https://splabs.io/ai-psychosis-and-cognitive-cost

[4] PSA Field Guide: https://splabs.io/field-guide

[5] PSA API: https://splabs.io/docs/api

[6] Previous HN Article Linked to AI Psychosis and RLHF: https://news.ycombinator.com/item?id=48177198

Comments

lotusville•36m ago
Thanks

Show HN: Lazytf – a terminal UI for reviewing Terraform plans

https://github.com/ushiradineth/lazytf
1•ushiradineth•42s ago•0 comments

Who's behind Facebook's hateful AI slop about the UK? They may be in South Asia

https://www.theguardian.com/commentisfree/2026/may/19/social-media-facebook-ai-slop-hateful-south...
1•curiousObject•3m ago•0 comments

Silicon Valley's Answer to Declining Male Fertility? Sperm Racing

https://www.nytimes.com/2026/05/19/magazine/sperm-racing-silicon-valley.html
1•Nelkins•4m ago•0 comments

Clang Lifetime Safty Doc Update

https://clang.llvm.org/docs/LifetimeSafety.html
1•pjmlp•4m ago•0 comments

Autonomous underwater robot discovers hidden coral reef 'hotspots'

https://phys.org/news/2026-05-autonomous-underwater-robot-hidden-coral.html
1•gmays•5m ago•0 comments

Microsoft Now Has a Fedora-Based Linux Distro

https://itsfoss.com/news/azure-linux-4/
1•mikece•6m ago•0 comments

Ask HN: Is there any problem using multi-LLM

1•omertt27•6m ago•2 comments

Galápagos Syndrome

https://en.wikipedia.org/wiki/Gal%C3%A1pagos_syndrome
1•doruk101•8m ago•0 comments

Bond Yields Near Two-Decade High Open Rift Among Investors

https://financialpost.com/pmn/business-pmn/us-yields-flirting-with-2007-highs-entice-and-divide-i...
1•monkeydust•9m ago•0 comments

Anthropic Is Preparing for IPO and We Should Be Worried

https://www.vincentschmalbach.com/anthropic-ipo-developers-should-be-worried-v2/
2•vincent_s•9m ago•0 comments

Standard Chartered to cut more than 7k jobs as it steps up AI use

https://www.theguardian.com/business/2026/may/19/standard-chartered-bank-cut-jobs-ai-london
1•Brajeshwar•10m ago•0 comments

AIllowpages – Free AI tools search engine with 2500 tools, zero ads

https://aillowpages.com/
1•muralipala•11m ago•0 comments

When Rails-way does not work anymore?

https://paweldabrowski.com/farewell-to-rails-way/when-rails-way-does-not-work
1•pdabrowski6•11m ago•0 comments

Why Taxing the Wealthy Is Harder Than It Looks

https://ofdollarsanddata.com/why-taxing-the-wealthy-is-harder-than-it-looks/
1•speckx•13m ago•1 comments

OpenAI Dismissal Motion Says ChatGPT Is Mere Tool, Not Attorney

https://news.bloomberglaw.com/litigation/open-ai-dismissal-motion-says-chatgpt-is-mere-tool-not-a...
1•1vuio0pswjnm7•13m ago•0 comments

New Lifetime Plex Pass Pricing

https://www.plex.tv/blog/new-lifetime-plex-pass-pricing/
1•Larrikin•14m ago•0 comments

'De-Extinction' Startup Just Hatched Baby Chicks from 3D-Printed Artificial Egg

https://gizmodo.com/de-extinction-start-up-just-hatched-baby-chicks-from-a-3d-printed-artificial-...
1•austinallegro•14m ago•0 comments

The economics of superstar AI researchers

https://epochai.substack.com/p/the-economics-of-superstar-ai-researchers
1•gmays•15m ago•0 comments

The Two X's Problem: Why AI-designed brands feel like AI-designed brands

https://blog.codeyam.com/p/the-two-xs-problem
2•bastadani•16m ago•1 comments

Show HN: Childflow – command-tree network control(proxy/DNS/capture) for Linux

https://github.com/blacknon/childflow
1•blacknon•16m ago•0 comments

I vibecoded a Kalshi bot to $6k profit and opensourced it

https://mcinerney.ai/writings/how-i-botted-6k-prediction-markets-as-i-slept/
1•DanMcInerney•17m ago•0 comments

Tired?

https://budgetbites.website/login
1•ClarenceJackson•17m ago•0 comments

Show HN: Chidori – Fast web-to-Markdown fetching for AI agents

https://github.com/taishikato/chidori
1•taishikato•19m ago•0 comments

Singapore's royal descendants living low-key as taxi drivers and office workers

https://www.scmp.com/news/asia/southeast-asia/article/3107188/meet-singapores-royal-descendants-l...
1•teleforce•19m ago•0 comments

China deploys first wind-powered and ocean-cooled underwater data center

https://www.tomshardware.com/tech-industry/china-says-worlds-first-offshore-wind-powered-underwat...
1•HardwareLust•20m ago•2 comments

Software's Centaur Era

https://twitchard.github.io/posts/2026-05-18-softwares-centaur-era.html
1•speckx•20m ago•0 comments

Echoform – unlimited LLM memory via a single 64 KB hypervector

https://github.com/OpenAgentic-Labs/echoform-ghost-memory
1•Nagendhra•21m ago•0 comments

Show HN: Og-zkp – Prove your Bitcoin OG status in zero-knowledge

https://og-zkp.com/
1•lukechilds•22m ago•0 comments

AI Coding Feels Like Using an Unreliable Compiler

https://tomassetti.me/ai-coding-feels-like-using-an-unreliable-compiler/
2•ftomassetti•22m ago•1 comments

The Science of Cities. 10 Books You Must Read

https://nautil.us/the-science-of-cities-10-books-you-must-read-1280919
1•Tomte•23m ago•0 comments