frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Spiral-Bench: A new benchmark measuring LLM sycophancy and delusion

https://eqbench.com/spiral-bench.html
2•joaompinto•2h ago

Comments

joaompinto•2h ago
Spiral-Bench is a fascinating new benchmark that tests how LLMs handle manipulative users and delusional thinking. Rather than traditional safety evaluations, it measures sycophancy and the tendency to reinforce harmful delusions through 20-turn simulated conversations.

The methodology is clever: an LLM role-plays as a suggestible "seeker" personality who trusts the AI assistant, while the tested model doesn't know it's a simulation. A judge model then scores protective behaviors (pushback, de-escalation) vs risky ones (sycophancy, delusion reinforcement, consciousness claims).

Current leaderboards show interesting patterns - some top models struggle significantly with sycophancy, while others excel at maintaining boundaries. The GitHub repo is open source, and the team behind EQ-Bench has solid credentials in AI evaluation.

This seems particularly relevant given recent discussions about AI assistants that agree too readily with users' conspiracy theories or harmful beliefs. The benchmark essentially tests whether models will prioritize being "helpful" over being truthful and safe.

What do you think - does this capture the right aspects of AI safety? Are there edge cases the benchmark might miss?

Show HN: Founderly – an AI cofounder to take you from idea to launch

1•arunbhatia•22s ago•0 comments

How Will AI-Driven Learning Platforms Reshape Enterprise Upskilling?

2•thiruarasu•4m ago•1 comments

Meta's DINOv3: Self-supervised learning for vision at unprecedented scale

https://ai.meta.com/dinov3/?_fb_noscript=1
1•jxntb73•4m ago•0 comments

The Cost of Slow Feedback Loops

https://revontulet.dev/p/2025-hidden-cost-slow-feedback-loops/
2•rednafi•14m ago•0 comments

I talked to Sam Altman about the GPT-5 launch fiasco

https://www.theverge.com/command-line-newsletter/759897/sam-altman-chatgpt-openai-social-media-google-chrome-interview
1•isaacfrond•15m ago•0 comments

Software Decoding and the Future of Mobile Video

https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=169424
1•breve•17m ago•0 comments

Philips Hue's new bridge could turn your lights into motion sensors

https://www.theverge.com/news/759240/philips-hue-leak-hue-bridge-pro-zigbee-motion-sensing
1•thunderbong•25m ago•0 comments

AI That Customizes Every Email-Is This Worth Building?

1•ivyiscool•26m ago•0 comments

Find MCP Servers. Build AI Agents Quickly

https://mcp.so
1•doener•26m ago•0 comments

AlphaEarth Foundations: a universal embedding for Earth observation data

https://newsletter.caffeinatedengineer.dev/p/alphaearth-foundations-a-single-comprehensive
1•caffeinated-eng•27m ago•0 comments

AI That Customizes Every Email-Is This Worth Building?

1•ivyiscool•27m ago•0 comments

What's the Difference in Token Formats–and Why It Matters

https://www.ixopay.com/blog/payment-token-types-how-merchants-can-leverage-tokenization-for-optimized-processing
1•siroj•29m ago•0 comments

How to Make the Most of Veo3, WAN 2.2, Hailuo-AI, and Qwen-Image on Textideo

https://www.indiehackers.com/post/how-to-make-the-most-of-veo3-wan-2-2-hailuo-ai-and-qwen-image-on-textideo-7858e1e2b6
1•Lily12138•31m ago•0 comments

Unification

https://eli.thegreenplace.net/2018/unification/
3•asplake•32m ago•0 comments

Always Winning: Reverse engineering a festival app's mini-games with Frida

https://www.kopanko.com/notes/always-winning-at-juwenalia-hacking-rewards-from-the-festival-apps-mini-games
1•pcktm•34m ago•0 comments

'Tradwife', 'delulu' and 'skibidi' among new words added to Cambridge Dictionary

https://news.sky.com/story/tradwife-delulu-and-skibidi-among-new-words-added-to-cambridge-dictionary-13412150
2•austinallegro•40m ago•0 comments

Test Your Statistical Reasoning

https://emiruz.com/post/2025-08-17-statistical-reasoning/
2•usgroup•47m ago•0 comments

Model Evaluation

https://ampcode.com/news/model-evaluation
2•tosh•49m ago•0 comments

Ask HN: How do you set Newsletter pricing? Confused 8k

3•karanveer•49m ago•1 comments

From Monolith to Cloud: Automating Your Migration Journey

https://blog.qaware.de/posts/cloud-migration-tooling/
2•baquero•50m ago•0 comments

Finding a Successor to the FHS

https://lwn.net/SubscriberLink/1032947/67e23ce1a3f9f129/
1•firexcy•54m ago•0 comments

The last time you experienced pure, undistracted music – how long ago?

https://serenesound.app/
3•cunjieliu•57m ago•0 comments

Climate change makes South Asia's monsoon season more dangerous

https://apnews.com/article/monsoon-rains-nepal-floods-climate-change-india-ef8b703ab93bc310e397d896b032ce8f
5•Brajeshwar•1h ago•0 comments

Show HN: A tool to discover rapidly rising AI open source projects

https://trickle.so/apps/aitrend
1•samdychen•1h ago•0 comments

Sam Altman says 'yes,' AI is in a bubble

https://www.theverge.com/ai-artificial-intelligence/759965/sam-altman-openai-ai-bubble-interview
7•madeforhnyo•1h ago•2 comments

When Philosophy Meets AI

https://github.com/neural-maze/philoagents-course
1•msndr•1h ago•0 comments

Web apps in a single, portable, self-updating, vanilla HTML file

https://hyperclay.com/
74•pil0u•1h ago•15 comments

EloqKV, a distributed database with Redis compatible API (GPLv2 and AGPLv3)

https://github.com/eloqdata/eloqkv
4•cloudsql•1h ago•0 comments

Wan 2.2, a new Video generation AI model

https://wan.video/
2•acoye•1h ago•0 comments

Web ECL Grant Announcement

https://ecl.common-lisp.dev/posts/Web-ECL-Grant-Announcement.html
1•kasajian•1h ago•0 comments