news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

KV Cache Transform Coding for Compact Storage in LLM Inference

https://arxiv.org/abs/2511.01815

1•walterbell•1m ago•0 comments

A quantitative, multimodal wearable bioelectronic device for stress assessment

https://www.nature.com/articles/s41467-025-67747-9

1•PaulHoule•3m ago•0 comments

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

https://www.wsj.com/world/india/why-big-tech-is-throwing-cash-into-india-in-quest-for-ai-supremac...

1•saikatsg•3m ago•0 comments

How to shoot yourself in the foot – 2026 edition

https://github.com/aweussom/HowToShootYourselfInTheFoot

1•aweussom•3m ago•0 comments

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents

2•archb•5m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...

1•walterbell•6m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/

1•danver0•7m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/

1•bumahkib7•7m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag

1•artigent•12m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor

3•dragandj•13m ago•0 comments

Tmux to Zellij (and Back)

https://www.mauriciopoppe.com/notes/tmux-to-zellij/

1•maurizzzio•14m ago•1 comments

Ask HN: How are you using specialized agents to accelerate your work?

1•otterley•15m ago•0 comments

Passing user_id through 6 services? OTel Baggage fixes this

https://signoz.io/blog/otel-baggage/

1•pranay01•16m ago•0 comments

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

https://davmail.sourceforge.net/

1•todsacerdoti•17m ago•0 comments

Visual data modelling in the browser (open source)

https://github.com/sqlmodel/sqlmodel

1•Sean766•19m ago•0 comments

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

https://github.com/chinonsochikelue/tharos

1•fluantix•19m ago•0 comments

Oddly Simple GUI Programs

https://simonsafar.com/2024/win32_lights/

1•MaximilianEmel•20m ago•0 comments

The New Playbook for Leaders [pdf]

https://www.ibli.com/IBLI%20OnePagers%20The%20Plays%20Summarized.pdf

1•mooreds•20m ago•0 comments

Interactive Unboxing of J Dilla's Donuts

https://donuts20.vercel.app

1•sngahane•22m ago•0 comments

OneCourt helps blind and low-vision fans to track Super Bowl live

https://www.dezeen.com/2026/02/06/onecourt-tactile-device-super-bowl-blind-low-vision-fans/

1•gaws•23m ago•0 comments

Rudolf Vrba

https://en.wikipedia.org/wiki/Rudolf_Vrba

1•mooreds•24m ago•0 comments

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

https://www.medpagetoday.com/neurology/autism/119747

1•paulpauper•25m ago•0 comments

Wellness Hotels Discovery Application

https://aurio.place/

1•cherrylinedev•25m ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...

1•mooreds•26m ago•0 comments

Sebastian Galiani on the Marginal Revolution

https://marginalrevolution.com/marginalrevolution/2026/02/sebastian-galiani-on-the-marginal-revol...

2•paulpauper•29m ago•0 comments

Ask HN: Are we at the point where software can improve itself?

1•ManuelKiessling•29m ago•2 comments

Binance Gives Trump Family's Crypto Firm a Leg Up

https://www.nytimes.com/2026/02/07/business/binance-trump-crypto.html

1•paulpauper•30m ago•1 comments

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

https://old.reddit.com/r/ClaudeCode/comments/1qy5l0n/reverse_engineering_chinese_shitprogram_for/

1•edward•30m ago•0 comments

Indian Culture

https://indianculture.gov.in/

1•saikatsg•33m ago•0 comments

Show HN: Maravel-Framework 10.61 prevents circular dependency

https://marius-ciclistu.medium.com/maravel-framework-10-61-0-prevents-circular-dependency-cdb5d25...

1•marius-ciclistu•33m ago•0 comments

Open in hackernews

UpBench: Dynamically Evolving Real-World Labor-Market Agentic Benchmark [pdf]

https://www.upwork.com/static/webflow/assets/webflow-human-agent-productivity-index/upbench_paper_v6.pdf

2•pablomendes•2mo ago

Comments

pablomendes•2mo ago

As large language model (LLM) agents increasingly undertake digital work, reliable frameworks are needed to evaluate their real-world competence, adaptability, and capacity for human collaboration. Existing benchmarks remain largely static, synthetic, or domainlimited, providing limited insight into how agents perform in dynamic, economically meaningful environments. We introduce UpBench, a dynamically evolving benchmark grounded in real jobs drawn from the global Upwork labor marketplace. Each task corresponds to a verified client transaction, anchoring evaluation in genuine work activity and financial outcomes. UpBench employs a rubric-based evaluation framework, in which expert freelancers decompose each job into detailed, verifiable acceptance criteria and assess AI submissions with per-criterion feedback. This structure enables fine-grained analysis of model strengths, weaknesses, and instruction-following fidelity beyond binary pass/fail metrics. Human expertise is integrated throughout the data pipeline (from job curation and rubric construction to evaluation) ensuring fidelity to real professional standards and supporting research on human-AI collaboration. By regularly refreshing tasks to reflect the evolving nature of online work, UpBench provides a scalable, human-centered foundation for evaluating agentic systems in authentic labor-market contexts, offering a path toward a collaborative framework, where AI amplifies human capability through partnership rather than replacement.