news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I Reduced 5 hours of Testing my Agentic AI applcaition to 10 mins

https://github.com/onepaneai/mantis

1•ashmil•1h ago

Comments

ashmil•1h ago

Hi HN,

I was spending over 5 hours manually testing my Agentic AI application before every patch and release. While automating my API and backend tests was straightforward, testing the actual chat UI was a massive bottleneck. I had to sit there, type out prompts, wait for the AI to respond, read the output, and ask follow-up questions. As the app grew, releases started taking longer just because of manual QA.

To solve this, I built Mantis. It’s an automated UI testing tool designed specifically to evaluate LLM and Agentic AI applications right from the browser.

Here is how it works under the hood:

Define Cases: You define the use cases and specific test cases you want to evaluate for your LLM app.

Browser Automation: A Chrome agent takes control of your application's UI in a tab.

Execution: It simulates a real user by typing the test questions into the chat UI and clicking send.

Evaluation: It waits for the response, analyzes the LLM's output, and can even ask context-aware follow-up questions if the test case requires it.

Reporting: Once a sequence is complete, it moves to the next test case. Everything is logged and aggregated into a dashboard report.

The biggest win for me is that I can now just kick off a test run in a background Chrome tab and get back to writing code while Mantis handles the tedious chat testing.

I’d love to hear your thoughts. How are you all handling end-to-end UI testing for your chat apps and AI agents? Any feedback or questions on the approach are welcome!

AI-powered apps struggle with long-term retention, new report shows

https://techcrunch.com/2026/03/10/ai-powered-apps-struggle-with-long-term-retention-new-report-sh...

1•pseudolus•49s ago•0 comments

PEP 827 – Type Manipulation

https://peps.python.org/pep-0827/

1•EvgeniyZh•1m ago•0 comments

NASA's Van Allen Probe A to re-enter atmosphere

https://phys.org/news/2026-03-nasa-van-allen-probe-atmosphere.html

2•bookmtn•1m ago•0 comments

How age standardization make health metrics comparable

https://ourworldindata.org/age-standardization

1•sohkamyung•3m ago•0 comments

Discovering Little Worlds (2020)

https://dmitrybrant.com/2020/08/01/discovering-little-worlds

1•wonger_•3m ago•0 comments

Ukraine Reaches a Milestone: Making ‘China-Free’ Drones

https://www.nytimes.com/2026/03/11/world/europe/ukraine-drones-china.html

1•giuliomagnifico•4m ago•0 comments

Simple-Git NPM package has CVSS 9.8 RCE; 5M+ weekly downloads–check lockfiles

https://www.codeant.ai/security-research/simple-git-remote-code-execution-cve-2026-28292

1•birdculture•6m ago•0 comments

Automatic Pronunciation Error Detection and Correction of the Holy Quran

https://arxiv.org/abs/2509.00094

1•handfuloflight•9m ago•0 comments

Show HN: A simple hardened AI Docker cluster

https://github.com/kummahiih/secure-mcp/

1•kummap•12m ago•0 comments

Astro 6.0 Is Released

https://astro.build/blog/astro-6/

2•mariuz•12m ago•0 comments

A new model defines an upper limit to planetary radiation belt intensity

https://phys.org/news/2026-03-upper-limit-planetary-belt-intensity.html

3•bookmtn•12m ago•0 comments

VoltRN CLI for React Native/Expo Scaffolding, Generators

https://github.com/IronTony/voltrn-cli

1•IronTony•13m ago•1 comments

Analect – AST and LLM Code Summary and Navigation

https://analect.dev

1•ascent817•15m ago•0 comments

Show IH: I built a runtime control plane to stop AI agents from burning money

https://github.com/vijaym2k6/SteerPlane

1•vijaym2k6•16m ago•0 comments

Show HN: Free API toolkit – cron, webhooks, DNS, hashing, regex

https://frog03-20494.wykr.es/devtools/

1•patchnull•20m ago•1 comments

Fooling Go's X.509 Certificate Verification

https://danielmangum.com/posts/fooling-go-x509-certificate-verification/

1•hasheddan•20m ago•0 comments

Some relationships deepen when you tell the truth and some end

https://www.henrikkarlsson.xyz/p/going-your-own-way

1•squirrel•20m ago•0 comments

Open Source Masterclass – Learn to Contribute Upstream

https://opensourcemasterclass.org/

3•antoviaque•21m ago•0 comments

Tell HN: Moltbook was running in my browser

2•ramon156•27m ago•1 comments

As AI data centers scale, investigating their impact becomes its own beat

https://www.niemanlab.org/2026/03/as-ai-data-centers-scale-investigating-their-impact-becomes-its...

2•giuliomagnifico•28m ago•0 comments

Claude Skills: The Complete Guide

https://aistaffkit.com/claude-skills-guide

1•modestpacket•30m ago•0 comments

Get 500 credits for Manus registration

https://manus.im/invitation/ZQFSZCXJGQKD1GP

1•doener•30m ago•0 comments

Show HN: Colab pipeline for auto-labeling datasets with prompt and training YOLO

https://github.com/useful-ai-tools/detect-anything

2•eyasu6464•32m ago•1 comments

Windows 12 could be the tipping point that pushes you to Linux

https://www.zdnet.com/article/windows-12-rumors-linux-migration/

1•robtherobber•32m ago•0 comments

Ask HN: What starts to break down as your notes grow?

2•vajafafa•38m ago•1 comments

AI research paper – IEEE open access journal

https://ieeexplore.ieee.org/document/11424402

2•funnyguy678•42m ago•0 comments

C++26 safety features won't safe you

https://lucisqr.substack.com/p/c26-safety-features-wont-save-you

2•todsacerdoti•42m ago•0 comments

What Is Zensical?

https://zensical.org/about/

1•Tomte•43m ago•0 comments

Ask HN: Should we add game as product type for 3DIMLI?

1•arpit077•44m ago•0 comments

Fixing Programmatic Tool Calling with Types

https://blog.coldboot.org/fixing-programmatic-tool-calling-with-types

1•matchcase•46m ago•0 comments