frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•43s ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
1•LiamPowell•2m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
2•duxup•5m ago•0 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•6m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•18m ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•20m ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
2•savrajsingh•21m ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•23m ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•27m ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•31m ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
1•g1raffe•34m ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•39m ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
1•rolph•44m ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•45m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•50m ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/
1•telui•51m ago•1 comments

They Hijacked Our Tech [video]

https://www.youtube.com/watch?v=-nJM5HvnT5k
1•cedel2k1•55m ago•0 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
34•chwtutha•55m ago•6 comments

HRL Labs in Malibu laying off 1/3 of their workforce

https://www.dailynews.com/2026/02/06/hrl-labs-cuts-376-jobs-in-malibu-after-losing-government-work/
4•osnium123•56m ago•1 comments

Show HN: High-performance bidirectional list for React, React Native, and Vue

https://suhaotian.github.io/broad-infinite-list/
2•jeremy_su•57m ago•0 comments

Show HN: I built a Mac screen recorder Recap.Studio

https://recap.studio/
1•fx31xo•1h ago•1 comments

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

1•kachapopopow•1h ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/
1•melvinodsa•1h ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•1h ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis
4•thread_id•1h ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504
1•geeknews•1h ago•0 comments

TSMC to produce 3-nanometer chips in Japan

https://www3.nhk.or.jp/nhkworld/en/news/20260205_B4/
3•cwwc•1h ago•0 comments

Quantization-Aware Distillation

http://ternarysearch.blogspot.com/2026/02/quantization-aware-distillation.html
2•paladin314159•1h ago•0 comments

List of Musical Genres

https://en.wikipedia.org/wiki/List_of_music_genres_and_styles
1•omosubi•1h ago•0 comments

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

https://sknet.ai/
1•BeinerChes•1h ago•0 comments
Open in hackernews

LLM Economist – Mechanism Design for Simulated Agent Societies

https://github.com/sethkarten/LLM-Economist
2•milkkarten•6mo ago

Comments

milkkarten•6mo ago
We simulate large-scale agent societies where heterogeneous personas work, adapt, and vote—governed by an in-context planner optimizing social welfare.

The system models decentralized governance, dynamic tax policy, and institutional evolution—entirely via in-context reinforcement learning, no fine-tuning required.

Full paper (arXiv): https://arxiv.org/abs/2507.15815

slwvx•6mo ago
I like the idea of simulating a society! I don't pretend to understand everything that you're doing, so please correct me where I'm wrong below.

The right side of Fig 5a shows that your LLM tool has 80% tax for people making between 0 and $11.6k/year, then drops to about 30% for the next tax bracket, with other tax brackets moving around all over the place. This seems to be designed to induce people to NOT pay taxes.For all its faults, I think the US progressive system is fairly rational and does a pretty good job of inducing people to actually pay taxes [1]; specifically the (effectively) negative tax rate in the US for low-income people gets them in the habit of paying taxes. I.e. whatever underlying model of social welfare you are assuming to get the great social welfare on the right side of Fig 5a seems to not model real people. I wonder if some LLM hallucinations are going on under the hood to create the strange behavior in Fig 5a.

Some questions: You don't seem to model the US system of tax credits; is that right? Also, is there a Saez tax below $47.2k in Fig 5a? What about between $244k and $609k? I.e. is the Saez tax ever under the LLM tax?

[1] https://blogs.worldbank.org/en/governance/why-does-progressi...

milkkarten•6mo ago
These are the marginal tax rates not the effective tax rate (e.g. 80% of first $10k, 30% of $10k-20k). We do not model tax credits here. We try to keep the system as simple as possible so that we can effectively evaluate changes. As is, the Economic theory is intractable once we introduce bounded rationality from purely rational. We do think in future work we can potentially work out some smoothness in the overall tax rate but try to let the LLM planner try what it thinks is best in order to help test the in-context optimization capabilities.

Also, while there is a complicated tax code in the US, in our simulation there is no way for agents to avoid paying taxes :)

The Saez tax rates are perturbed from the LLM Economist's tax rates to find the theoretically optimal values according to the economic theory.

Thanks for the interest and I hope that this helps clarify some of the details.

slwvx•6mo ago
Thanks for the further details!

Ah, the fact that they are marginal rates makes marginally more sense, but it still seems to me that the SWF in fig 5a has very little relation to the real world.

> Also, while there is a complicated tax code in the US, in our simulation there is no way for agents to avoid paying taxes :)

Seems like an obvious thing to add. I.e. if you believe the World Bank when they say "People are more willing to pay tax when taxes are progressive" [1], then it seems worthwhile to update your model to include this.

[1] https://blogs.worldbank.org/en/governance/why-does-progressi...

MutedEstate45•6mo ago
Interesting approach, but I'm curious about the practical cost considerations. A 1,000-agent simulation could easily be hundreds of thousands of API calls. The repo recommends gpt-4o-mini over gpt-4 and supports local Llama models, but there's no guidance on the performance trade-offs.

Would love to see cost-per-experiment breakdowns and quality benchmarks across model tiers. Does a local Llama 3.1 8B produce meaningful economic simulations or do you need the reasoning power of frontier models? This could be the difference between $5 and $500 experiments.

milkkarten•6mo ago
Using smaller, cheaper agents is one of the goals of the work. There is a Pareto frontier though: by using smaller, faster, cheaper agents, the number of steps required to converge increases. We touch upon this briefly in the paper
MutedEstate45•6mo ago
Thanks. That Pareto trade-off is exactly what I'm trying to quantify not just qualify. For example, if I've got a $50 budget, what's the sweet spot?

Scenario A: 100 agents × GPT-4o-mini × 500 steps Scenario B: 500 agents × local Llama 3-8B × 1,000+ steps

A quick table like "X agents × Y model × Z steps → tokens, $, convergence score" in the README would let new users budget experiments without having to read the whole paper plus run expensive experiments just to discover basic resource planning.

milkkarten•6mo ago
We ran each method in under 24 hours on a singular H100. I understand your point and think we will include this in future iterations of our work since this is very interesting from the user perspective. Though, in the paper we focus more on algorithmic concerns.
MutedEstate45•6mo ago
I'll look out for future iterations. Thanks and good luck with the paper.