frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

OSUniverse: Building a Better OSWorld

2•mountainriver•5h ago
Hey all,

We are happy to release a new benchmark for computer use. We didn’t set out to build a benchmark but found the current state of OSWorld to be very challenging to work with and numerous tests were faulty.

OSUniverse aims to be dead simple to use, it only requires docker and can run in a single command. It offers test levels that increase in complexity and are easy to extend.

We have benchmarked all the top agents. As new GUI agents are released we will continue to update their performance.

Enjoy!

1•voice_prompt•16s ago

The Internet, Remote Work, AI Ethics, and an American Pope

https://blog.slamdunk.software/the-internet-remote-work-ai-ethics-and-an-american-pope/
1•nickagliano•24s ago•0 comments

Sex-specific energy expenditure during the Alaska wilderness ski classic

https://www.frontiersin.orghttps//www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2025.1543834/full
1•PaulHoule•3m ago•0 comments

Wikipedia legally challenges 'flawed' online safety rules

https://www.bbc.co.uk/news/articles/c62j2gr8866o
1•todsacerdoti•3m ago•0 comments

AMD GPU Programming in Julia

https://amdgpu.juliagpu.org/dev/
3•pxl-th•5m ago•0 comments

Nuanced: Make AI tools smarter with semantic understanding

https://www.nuanced.dev/
1•handfuloflight•5m ago•0 comments

Benchmarking Agentic LLM and VLM Reasoning for Gaming with Nvidia Nim

https://developer.nvidia.com/blog/benchmarking-agentic-llm-and-vlm-reasoning-for-gaming-with-nvidia-nim/
1•abetaha•5m ago•0 comments

Bento Gets a Makeover

https://warpstreamlabs.github.io/bento/
2•ordinarily•5m ago•0 comments

Code Navigation for AI SWEs: What We've Learned So Far

https://www.engines.dev/blog/code-navigation
1•handfuloflight•6m ago•0 comments

Floating point compression – how small can we get?

https://www.neilhenning.dev/posts/2022-09-17-floatingpointcompression/
1•bladeee•7m ago•0 comments

Engines.dev: AI Platform Engineer

https://www.engines.dev/
1•handfuloflight•7m ago•0 comments

Invariant-Based Cryptography

https://zenodo.org/records/15368121
1•stas-semenov•8m ago•0 comments

Intelligent Document Processing Leaderboard

https://idp-leaderboard.org/
1•prats226•8m ago•0 comments

Show HN: I created open source directory builder template

https://github.com/eashish93/direbase
1•eashish93•9m ago•0 comments

The Bull Case for an AI Native Investment Bank

https://open.substack.com/pub/fullydistributed/p/the-bull-case-for-an-ai-native-investmen
1•pongogogo•14m ago•0 comments

Type-Safe Routing in Gleam

https://www.kurz.net/posts/gleam-routing
1•Alupis•16m ago•0 comments

Malicious NPM Packages Use Telegram to Exfiltrate BullX Credentials

https://socket.dev/blog/malicious-npm-packages-use-telegram-to-exfiltrate-bullx-credentials
1•feross•19m ago•0 comments

Most businesses are collapsing under invisible labor. We gave ours a memory

1•alkhemyst•21m ago•0 comments

An Interview with a Fired Web Content Manager at the CFPB

https://defector.com/an-interview-with-a-fired-web-content-manager-at-the-consumer-financial-protection-bureau
1•coloneltcb•21m ago•0 comments

The enshitification of YouTube's full album playlists

https://www.engadget.com/entertainment/youtube/the-enshitification-of-youtubes-full-album-playlists-172934629.html
2•toomuchtodo•23m ago•0 comments

Mercury Delay Line Memory

1•amosjyng•23m ago•0 comments

Windows USB Disk Creator for macOS

https://github.com/TechUnRestricted/WinDiskWriter
1•kls0e•23m ago•1 comments

Max: A flat pricing subscription for Claude Code

https://support.anthropic.com/en/articles/11145838-using-claude-code-with-your-max-plan
2•namukang•23m ago•0 comments

EcoFlow brings its plug-in solar power plant to US homes

https://www.theverge.com/news/661640/ecoflow-stream-us-plug-in-solar-specs-price
1•zekrioca•25m ago•0 comments

MIT engineering students crack egg dilemma, finding sideways is stronger

https://news.mit.edu/2025/mit-engineering-students-crack-egg-dilemma-sideways-stronger-0508
1•raybb•29m ago•0 comments

Writing an LLM from scratch, part 13 – attention heads are dumb

https://www.gilesthomas.com/2025/05/llm-from-scratch-13-taking-stock-part-1-attention-heads-are-dumb
1•gpjt•30m ago•0 comments

New papers address mystery why GLP-1 agonists AND antagonists cause weight loss

https://www.science.org/content/blog-post/gipr-agonists-and-antagonists-do-same-thing-how
2•ck2•30m ago•0 comments

Passing Messages

https://thedailywtf.com/articles/passing-messages
2•mifydev•31m ago•0 comments

Show HN: Koodi – A PWA that aggregates USSD codes for African telecom users

https://www.koodi.africa/
1•adagbeleonel•32m ago•0 comments

Show HN: Kit – open-source toolkit for building AI devtools

https://kit.cased.com/
3•milar•33m ago•0 comments