frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

OSUniverse: Building a Better OSWorld

5•mountainriver•2d ago
Hey all,

We are happy to release a new benchmark for computer use. We didn’t set out to build a benchmark but found the current state of OSWorld to be very challenging to work with and numerous tests were faulty.

OSUniverse aims to be dead simple to use, it only requires docker and can run in a single command. It offers test levels that increase in complexity and are easy to extend.

We have benchmarked all the top agents. As new GUI agents are released we will continue to update their performance.

Enjoy!

Ask HN: If 1 person can control 10 AI agents, why would still need that person?

2•flornt•1h ago•4 comments

Ask HN: What are good high-information density UIs (screenshots, apps, sites)?

492•troupo•2d ago•359 comments

Ask HN: RAG or shared memory for task planning across physical agents?

10•mbbah•22h ago•1 comments

Ask HN: How much better are AI IDEs vs. copy pasting into chat apps?

134•lopatin•2d ago•132 comments

Ask HN: Escaping a Low-Paying Nepali IT Job and Ineffective Learning Cycle

7•shivajikobardan•12h ago•2 comments

Ask HN: What would you do with the #manga chat channel in Libera IRC network?

4•babuloseo•15h ago•1 comments

Ask HN: Is there a service that offers Common Crawl as an API?

6•georgehill•15h ago•2 comments

Ask HN: Anyone using knowledge graphs for LLM agent memory/context management?

9•mbbah•22h ago•1 comments

Blazeio.SharpEvent: A Python Async Primitive That Scales to 1M Waiters with O(1)

6•anonyxbiz•1d ago•0 comments

Ask HN: AI-Filtering Browser Extension?

7•v-yanakiev•1d ago•3 comments

AI Summarizer: Summarize Web, YouTube and PDFs in Seconds–Free

10•huizhu•2d ago•2 comments

Ask HN: How to get good at marketing your product and SEO?

4•flashblaze•1d ago•4 comments

Ask HN: Hackathons feel fake now

211•sepidy•5d ago•128 comments

OSUniverse: Building a Better OSWorld

5•mountainriver•2d ago•0 comments

Ask HN: How could vibe coding show the code at a high level to non-programmers?

6•amichail•1d ago•8 comments

Ask HN: Are you using AI coding assistance?

8•cloudking•1d ago•13 comments

Ask HN: Nvidia GeForce RTX 5060 arrives May 19 at $299 revive PC builds?

10•byte-bolter•2d ago•11 comments

Ask HN: How do you obtain software development contracts?

30•codingclaws•3d ago•17 comments

Getting tired of Helm – any better way to handle deployments in Kubernetes?

22•DeborahEmeni_•4d ago•21 comments

Ask HN: Did Aliexpress stop shipping to US?

27•olalonde•4d ago•17 comments

Why do websites prevent pasting via onpaste="return false;"

5•gleenn•1d ago•3 comments

Ask HN: Help us validate our idea of an administrative app for small businesses

3•Kuyawa•1d ago•1 comments

Ask HN: How are you managing LLM inference at the edge?

7•gray_amps•2d ago•1 comments

We built an AI-powered voice tool to boost sales

2•Artjoker•2d ago•1 comments

Ask HN: Why is the sender chat box always on the right?

5•bdhe•2d ago•8 comments

Is a Smaller Internet Better?

3•sawyersweet•3d ago•0 comments

Ask HN: Which Firefox add-ons are you using in 2025?

6•vintageclothldn•2d ago•11 comments

Ask HN: Have you used Claude Code? Is it any good?

8•mbm•3d ago•9 comments

Ask HN: What's the best framework for building Mac/Windows desktop apps in 2025?

7•anoojb•20h ago•6 comments

Ask HN: Has anyone managed to pass Meta's Access Verification?

24•hipgrave•5d ago•11 comments