frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

UpBench: Dynamically Evolving Real-World Labor-Market Agentic Benchmark [pdf]

https://www.upwork.com/static/webflow/assets/webflow-human-agent-productivity-index/upbench_paper_v6.pdf
2•pablomendes•1h ago

Comments

pablomendes•1h ago
As large language model (LLM) agents increasingly undertake digital work, reliable frameworks are needed to evaluate their real-world competence, adaptability, and capacity for human collaboration. Existing benchmarks remain largely static, synthetic, or domainlimited, providing limited insight into how agents perform in dynamic, economically meaningful environments. We introduce UpBench, a dynamically evolving benchmark grounded in real jobs drawn from the global Upwork labor marketplace. Each task corresponds to a verified client transaction, anchoring evaluation in genuine work activity and financial outcomes. UpBench employs a rubric-based evaluation framework, in which expert freelancers decompose each job into detailed, verifiable acceptance criteria and assess AI submissions with per-criterion feedback. This structure enables fine-grained analysis of model strengths, weaknesses, and instruction-following fidelity beyond binary pass/fail metrics. Human expertise is integrated throughout the data pipeline (from job curation and rubric construction to evaluation) ensuring fidelity to real professional standards and supporting research on human-AI collaboration. By regularly refreshing tasks to reflect the evolving nature of online work, UpBench provides a scalable, human-centered foundation for evaluating agentic systems in authentic labor-market contexts, offering a path toward a collaborative framework, where AI amplifies human capability through partnership rather than replacement.

The New 2025 OWASP Top Ten

https://owasp.org/Top10/2025/0x00_2025-Introduction/
1•shehackspurple•3m ago•0 comments

All Praise to the Lunch Ladies

https://bittersoutherner.com/issue-no-12/all-praise-to-the-lunch-ladies
1•gmays•4m ago•0 comments

What's the difference between an artist and a creator?

https://www.ystrickler.com/whats-the-difference-between-an-artist-and-a-creator/
1•NaOH•5m ago•0 comments

Dredger-IoT: Ruby at the Edge – Open-Source Industrial Telemetry

https://dominickm.com/dredger-iot-ruby-at-the-edge-open-source-industrial-telemetry/
1•Kerrick•6m ago•0 comments

Why Your Audiobook Habit Might Be Sabotaging Deep Learning

https://zoia.org/posts/why-your-audiobook-habit-might-be-sabotaging-deep-learning/
1•freediver•7m ago•0 comments

I wont work for Google,Twitter, or Facebook (meta)

https://naildrivin5.com/blog/2011/08/01/why-i-wont-work-for-google-twitter-facebook.html
4•dzonga•8m ago•1 comments

Show HN: Ouverture.py – Content-addressed storage for multilingual functions

https://github.com/amirouche/ouverture.py
1•amirouche•10m ago•0 comments

Upwork warned me to stop using browser extensions

https://chromewebstore.google.com/detail/upwork-search-enhancement/pgpkjpoepjjbamidgffmedelnpiiinkk
2•riamuu•11m ago•1 comments

Better pre-commit, re-engineered in Rust

https://prek.j178.dev/
1•nikolay•13m ago•1 comments

The Push for Better Evidence on Microplastics and Health

https://www.medscape.com/viewarticle/push-better-evidence-microplastics-and-health-2025a1000vbd
1•wjb3•14m ago•0 comments

An Italian Company Builds the First Known Propellantless Space-Propulsion System

https://www.satcom.digital/news/genergo-an-italian-company-builds-the-worlds-first-known-propella...
1•maremmano•16m ago•0 comments

"Learning how to learn" via distance running

https://the-nerve-blog.ghost.io/learning-from-running/
1•mprast•17m ago•0 comments

The Orgasm Cure

https://aeon.co/essays/delayed-orgasm-the-sexual-technique-thats-better-than-sex
2•Eridanus2•18m ago•0 comments

Show HN: Four Solutions to Valid Parenthesis (LeetCode #20)

https://medium.com/@mcsimpson/solving-leetcode-0020-valid-parentheses-in-four-different-ways-bf3c...
1•smatthewaf•19m ago•0 comments

Hermes – A self-hosted video downloader for 1000 sites

https://github.com/TechSquidTV/Hermes
2•handystudio•19m ago•1 comments

'GoldenEye' at 30

https://variety.com/2025/film/news/goldeneye-at-thirty-1236581765/
1•birriel•19m ago•0 comments

Is the Gut-Autism Hypothesis a 'Dead End'?

https://www.medscape.com/viewarticle/gut-autism-hypothesis-dead-end-2025a1000vjo
1•wjb3•20m ago•1 comments

Keeplinker – 1-click link saving with drag-drop collections and public sharing

https://keeplinker.com/
2•intotheabyss999•22m ago•0 comments

A Rigorous Approach to the Algorithmic Composition of Iannis Xenakis(2009) [pdf]

https://monoskop.org/images/3/38/Hoffmann_Peter_Music_Out_of_Nothing_A_Rigorous_Approach_to_Algor...
1•ofalkaed•23m ago•0 comments

StutterZero: Speech Conversion for Stuttering Transcription and Correction

https://arxiv.org/abs/2510.18938
1•internetguy•25m ago•0 comments

Lessons introductory to the modern higher algebra (1876)

https://archive.org/details/3rdedlessonintro00salmuoft
2•nigelvr•26m ago•0 comments

Nintendo Gamecube Controller Protocol

https://www.int03.co.uk/crema/hardware/gamecube/gc-control.html
2•xeonmc•27m ago•0 comments

Kitten Space Agency pre-alpha release

https://ahwoo.com/store/KPbAA1Au/kitten-space-agency
2•tomsto•28m ago•0 comments

Raptor mini is rolling out in public preview for GitHub Copilot

https://github.blog/changelog/2025-11-10-raptor-mini-is-rolling-out-in-public-preview-for-github-...
3•nateb2022•28m ago•0 comments

Comet sends all your URLs to Perplexity servers and there's no way to stop it

https://shivankaul.com/blog/comet-privacy
1•skaul•28m ago•2 comments

HipKittens: Fast and Furious AMD Kernels

https://hazyresearch.stanford.edu/blog/2025-11-09-hk
1•mpweiher•30m ago•0 comments

Jimmy Wales, Wikipedia's founder/co-founder – Jung and Naiv: Episode 792

https://www.youtube.com/watch?v=uswRbWyt_pg
2•LauraMedia•31m ago•1 comments

Diabetes: The Silent Killer Across India

https://pharmeasy.in/research/diabetes
3•evanjrowley•31m ago•1 comments

Allowing Failure

https://hollisrobbinsanecdotal.substack.com/p/allowing-failure
1•HR01•32m ago•0 comments

Show HN: Open-Source GoLang SDK for Multi-Tenant Agents

https://github.com/Ingenimax/agent-sdk-go
1•tech-aguirre•32m ago•0 comments