frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Phare: A Safety Probe for Large Language Models

https://arxiv.org/abs/2505.11365
3•dberenstein1957•6h ago
We've just published a benchmark and accompanying paper on arXiv that challenges conventional leaderboard-driven LLM evaluation.

Phare focuses on factual reliability, prompt sensitivity, multilingual support, and how models handle false premises like issues that actually matter when you're building serious applications.

Some insights:

- Preference scores ≠ factual correctness.

- Framing effects can cause models to miss obvious falsehoods.

- Safety metrics like sycophancy and stereotype reproduction show surprising results across popular models.

Would love feedback from the community.

Show HN: Yet Another Todo App

https://yata.vijayp.dev
1•arnath•22s ago•0 comments

Enhancing Messaging on Reddit: A simpler, faster, and easier way to communicate

https://support.reddithelp.com/hc/en-us/articles/34720093903764-Enhancing-Messaging-on-Reddit-A-simpler-faster-and-easier-way-to-communicate
1•marktangotango•43s ago•0 comments

Apollo for Reddit dev Christian Selig to join Digg as an advisor

https://techcrunch.com/2025/05/21/apollo-for-reddit-dev-christian-selig-to-join-digg-as-an-advisor/
2•CharlesW•3m ago•0 comments

Russian GRU Targeting Western Logistics Entities and Technology Companies

https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Cyber-Security/GRU_Western_Logistics.pdf?__blob=publicationFile&v=3
3•doener•3m ago•0 comments

Lidar maker Luminar lays off more workers following CEO exit

https://www.theverge.com/news/671804/luminar-lidar-layoffs-ceo-exit
1•swesnow•4m ago•0 comments

JavaScript Ecosystem Performance

https://e18e.dev/
1•brianzelip•4m ago•0 comments

By Default, Signal Doesn't Recall

https://signal.org/blog/signal-doesnt-recall/
2•feross•7m ago•0 comments

Show HN: I made a tool to extract data from thousands of PDFs in minutes

https://pdfdino.com
2•adyaxenie•7m ago•0 comments

Too much sitting increases risk of future health problems in chest pain patients

https://theconversation.com/too-much-sitting-increases-risk-of-future-health-problems-in-chest-pain-patients-new-research-257089
2•rntn•10m ago•0 comments

Journey to 1000 models: Scaling Instagram's recommendation system

https://engineering.fb.com/2025/05/21/production-engineering/journey-to-1000-models-scaling-instagrams-recommendation-system/
1•mfiguiere•11m ago•0 comments

The Era of the Business Idiot

https://www.wheresyoured.at/the-era-of-the-business-idiot/
3•dvt•11m ago•0 comments

React, Visualized – A visual exploration of core React concepts

https://react.gg/visualized
2•benadam11•12m ago•0 comments

The Accuracy of On-Device LLMs

https://medium.com/@aazo11/on-the-accuracy-of-on-device-llms-34fd6cc420b5
1•aazo11•15m ago•1 comments

Drones, New Sensors, and AI Fill in Species Gaps on the Global Map of Life

https://www.esri.com/about/newsroom/blog/drones-fill-map-of-life-gaps
1•johnshades•15m ago•0 comments

Swisscom introduces Cybersecurity-focused Internet service beem

https://www.swisscom.ch/en/about/news/2025/05/21-beem.html
2•theanonymousone•15m ago•0 comments

Lokas: Record and transcribe your meetings in complete privacy

https://lokas.app/en/home/
1•simonebrunozzi•16m ago•0 comments

Single atom acts as a quantum computer and simulates molecules

https://www.nature.com/articles/d41586-025-01591-1
1•johnshades•17m ago•0 comments

Zoo Design Studio v1: A New Stack for Mechanical CAD

https://zoo.dev/blog/zoo-design-studio-v1
3•jessfraz•18m ago•0 comments

The AI Safety Risk Is a Conceptual Exploit

https://www.lesswrong.com/posts/czdhfaoHKn52DHRbC/the-real-ai-safety-risk-is-a-conceptual-exploit
2•foxanthony•22m ago•0 comments

1Password Is Down

https://status.1password.com
6•Geenirvana•22m ago•3 comments

Kroger's Shopper Profiles: Why You May Be Paying More Than Your Neighbors

https://www.consumerreports.org/money/questionable-business-practices/kroger-secret-grocery-shopper-loyalty-profiles-unfair-a1011215563/
2•johnshades•22m ago•4 comments

Show HN: Flutter Project That Displays Photos and Videos

https://github.com/tataDan/photos_videos_pexels_api
1•tataDan•22m ago•0 comments

Sizing Big Data Workloads: Key Numbers to Know

https://blog.twingdata.com/p/sizing-big-data-workloads-key-numbers
1•dangoldin•22m ago•0 comments

Into the Horrid Depths of Instagram Reels Music

https://pitchfork.com/thepitch/into-the-horrid-depths-of-instagram-reels-music/
3•pentagrama•23m ago•0 comments

Zoo CAD Engine Overview

https://zoo.dev/research/zoo-cad-engine-overview
2•jessfraz•23m ago•0 comments

Why Use Bayesian Methods for A/B Testing

https://briefer.cloud/blog/posts/abtesting/
2•thaisstein•24m ago•0 comments

Show HN: I vibe coded a complex trading app

https://apps.apple.com/us/app/tradofire/id6615085924
4•sumeruchat•24m ago•6 comments

Obsidian Bases

https://mas.to/@obsidian/114546538858212821
1•todsacerdoti•27m ago•0 comments

Team Penske fires senior leadership team in wake of cheating scandal

https://www.nytimes.com/athletic/6372661/2025/05/21/penske-fired-indycar-indianapolis-500/
3•ChrisArchitect•27m ago•1 comments

May 2025 – Housing / Credit Market Analysis, August 2024 Follow Up

https://wordsofwill.com/2025/05/21/q1-2025-housing-credit-market-analysis-august-2024-follow-up/
3•wowohwow•29m ago•4 comments