frontpage.

Hey everyone!

I've been building AI agents lately and kept running into the same problem: how do you test AI Agents?

I find that manually prompting the Agent for each release is tedious and not scalable. Also, existing solutions for testing agents are often complex to integrate.

To help with this I built a simple open-source testing framework that uses AI to validate AI: you define expected behavior and let an LLM judge if the output is semantically correct.

The LLMJudge returns a score (0-1) and reasoning for why it passed/failed.

You can try it live here (no signups): https://semantictest.dev

The playground runs real LLMJudge validation so you can see how the semantic testing works.

The code is completely open source and you can find extensive documentation here: https://docs.semantictest.dev

Would love feedback from you guys!

Thank you!

Nginx Unit project is about to be archived

How thin is the iPhone Air, really?

The Bari Weiss Strategy

ChatGPT image snares suspect in deadly Pacific Palisades fire

Amazon installing automated medication kiosks at clinics

AI Insurance Is Expensive

Best of times, worst of times: record fossil-fuel profits inflation & inequality

Arduino App Lab: Integrated Development Environment (IDE) for Arduino UNO Q

How to Run WordPress completely from RAM

Man accused of intentionally starting fire that destroyed Pacific Palisades

Zcash Price Doubled

I made a web tool that turns Markdown into presentation slides instantly

The Most Important Invention Ever Is Glue [video]

Enshittification with Cory Doctorow [YouTube] [video]

The Scaling Era: An Oral History of AI, 2019–2025

Glue raises $20M Series A for agentic team chat

Hacking GTA V RP Servers Using Web Exploitation Techniques

Rendu: A JavaScript Hypertext Preprocessor

Show HN: Magic Vizion – highlight anything, visualize instantly with one click

Show HN: KI Song Erstellen Kostenlos – AI Music Generator FüR Deutsche Musik

SoftBank to buy ABB robotics unit for $5.4B as it boosts its AI play

Building What Matters in Product and Experience

Microsoft's Fluid Icons, Figma's ChatGPT Diagrams and Okay DEV's Creative Beta

Women portrayed as younger than men online, and AI amplifies the bias

Show HN: Solving the cluster 1 problem with vCluster standalone

What fully automated firms will look like

Doctorow: American Tech Cartels Use Apps to Break the Law

Show HN: I built a local-first podcast app

Rebuild the World

Major protests against corruption in the Philippines

Show HN: SemanticTest – Test AI agents with semantic validation (open source)