frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
256•theblazehen•2d ago•85 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
26•AlexeyBrin•1h ago•2 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
706•klaussilveira•15h ago•206 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
969•xnx•21h ago•558 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
69•jesperordrup•6h ago•31 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
7•onurkanbkrc•47m ago•0 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
135•matheusalmeida•2d ago•35 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
45•speckx•4d ago•36 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
68•videotopia•4d ago•7 comments

Welcome to the Room – A lesson in leadership by Satya Nadella

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
39•kaonwarb•3d ago•30 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
13•matt_d•3d ago•2 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
45•helloplanets•4d ago•46 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
240•isitcontent•16h ago•26 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
238•dmpetrov•16h ago•126 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
340•vecti•18h ago•149 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
506•todsacerdoti•23h ago•248 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
389•ostacke•22h ago•98 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
304•eljojo•18h ago•188 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•186 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
428•lstoll•22h ago•284 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
3•andmarios•4d ago•1 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
71•kmm•5d ago•10 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
23•bikenaga•3d ago•11 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
96•quibono•4d ago•22 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
26•1vuio0pswjnm7•2h ago•16 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
271•i5heu•18h ago•219 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
34•romes•4d ago•3 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1079•cdrnsf•1d ago•461 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
64•gfortaine•13h ago•30 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
306•surprisetalk•3d ago•44 comments
Open in hackernews

DeepConf: Scaling LLM reasoning with confidence, not just compute

https://arxiviq.substack.com/p/deep-think-with-confidence
98•che_shr_cat•5mo ago

Comments

yoouareperfect•5mo ago
What's the difference with lowering the temperature?
furyofantares•5mo ago
I think ideally you want the whole path to be the most probable path, which is not likely to be the same as taking the most probable token at each step.

It's not remotely practical to select the most probable path but you can do a little bit of search a few tokens at a time.

nowittyusername•5mo ago
Correct me if I am wrong, but by the looks of things on that chart the reduction in token use and the better score are all related to the fact that this method used 512 samples.... This doesn't seem to be of any use for local running agents or anything that has severe vram restrictions such as local models that people can run at home. So this would only benefit enterprise level systems no?
Der_Einzige•5mo ago
This is inference time scaling where as it tries to generate a sample which through logprobs "looks wrong" it early cutsoff. It has a vLLM implementation which is easy to install and use. You can apply the technique to some 4bit model 7b model on your old laptop tier nvidia GPU easily.

Well, the folks on this website think installing vLLM (pip install vLLM...) is hard and that ollama - a far slower and shittier inference engine - is better. Enormous damage has been done to the hobbyist LLM ecosystem due to folks not knowing what tools work on what platform.

The one exception is for mac peasants where llama.cpp is still probably the best implementation, but if you have nvidia and you're not using sglang or vLLM, you're doing it wrong.

But this is of ENORMOUS use for folks who want to run tiny models at home. Go to bed wake up with a K=512 solution answer to your problem.

vlovich123•5mo ago
If you think getting VLLM working correctly is just a pip install vllm, you haven’t tried it in very many environments.
jxf•5mo ago
As someone who is operating an enterprise platform that uses vLLM in the stack, it's immensely harder than "pip install vllm" to have it working at scale and kept up to date.
genewitch•5mo ago
so uh, sglang recommends $330,000 worth of GPUs (minimum) per ebay prices.

vLLM i'm not sure about, i can't really tell what it does from docs.vllm.ai site. So i'm not sure you conveyed what, at least, i thought you were trying to; which is that llama.cpp isn't good enough for "home" use. Like, with a 3090 or 4000 series consumer GPU.

If you want to donate some L40S to the cause, i'll send you my P.O. box info

nickandbro•5mo ago
Wonder what this means for the pelican riding on a bicycle test? Or will it just be good at strictly reasoning type problems.
TurboSkyline•5mo ago
Is this article written by an LLM?
vpribish•5mo ago
sure looks like it was. If they cant bother to write it i'm for sure not going to read it.
ChrisMarshallNY•5mo ago
I'm not sure I'd see things the same way. Lot of work went into it; even if the final was LLMed. The result is quite readable.

The authors seem to be Chinese, and may not be that confident in their English. I suspect that we'll be seeing a lot more of this kind of stuff, as time goes on.

carbocation•5mo ago
I don't think disclosure is necessary, but I think it can build trust in cases like this. "Please note that we used an LLM to rewrite our initial English draft." The reason to do this is that then people don't waste cycles wondering about the answer to this question.
ChrisMarshallNY•5mo ago
I agree. Their LLMed English is much better than my Chinese.

Also, some of the very worst English I've ever read, has been technical prose, written by born-and-bred native English speakers with very high educational credentials.

Clear communication is important. The best idea on Earth, is worthless, if it can't be articulated well.

cubefox•5mo ago
> Lot of work went into it; even if the final was LLMed.

No, it was fully or almost fully LLM generated. See: https://arxiviq.substack.com/p/coming-soon

ChrisMarshallNY•5mo ago
So the LLM did all the research? From that posting, it sounds like they accepted a human-made paper, and LLMed it, themselves. The authors are not to blame at all.

If otherwise, then it looks like The Singularity has arrived.

cubefox•5mo ago
No the LLM wrote the substack article.
ChrisMarshallNY•5mo ago
That’s what I was saying.

It’s a perfectly valid article; an AI-generated summary of a lot of work done by humans.

Not a paper that would be presented for peer review, but rather, to be consumed by regular mensch (like me).

That’s actually something that AI is pretty good at. I use it to summarize stuff for me, all the time.

It should probably have a disclaimer, somewhere, saying what it is, maybe with a link to the raw source, but it’s just another way of communicating.

I’ve been reading human-generated marketing drivel for decades. This is actually a lot better than that stuff.

cubefox•5mo ago
Summarizing some random text is a quite different task from writing an explainer for a cutting edge AI research paper.
ChrisMarshallNY•5mo ago
Ah...I don't think this conversation has a future, but I have found that I can use an LLM to give a pretty damn good summary of some fairly verbose and well-organized "random text."
NitpickLawyer•5mo ago
arxiviq is a project ran by someone on substack. It's not the authors writing this. It's someone's project that takes papers from arxiv and posts them on their own substack. Probably with paid features later. don't forget to like and sub for my LLM type of thing...

Careful where you place your anger. You should not be angry at the people writing the paper.

carbocation•5mo ago
One thing that is confusing about this write-up is that "DeepConf-low" is only mentioned once and in a screenshot, but it seems to outperform DeepConf-high for several tasks. I guess I'll need to read the underlying paper, but that seems troublesome.
cubefox•5mo ago
It's likely confusing because it was written by an LLM.
swores•5mo ago
The confusing thing mentioned by the person you replied to is the data and naming from the actual paper, so no it's nothing to do with how the article was written. (Unless you're suggesting that the research paper was also written by an LLM, but I don't think you are?)
cubefox•5mo ago
> The confusing thing mentioned by the person you replied to is the data and naming from the actual paper

No I think the confusing thing is that the LLM-written blog post doesn't adequately explain the screenshot.

swores•5mo ago
Copied from the paper (halfway down page 6: https://arxiv.org/pdf/2508.15260 )

> "Specifically, DeepConf-low uses top η= 10% (corresponding to the 90th percentile) and DeepConf-high uses top η = 90% (corresponding to the 10th percentile) uniformly across all settings. This threshold ensures that during online generation, traces are terminated when their confidence falls below the level that retains the top η% highest-confidence traces from the warmup phase."

I'm not sure if I'm parsing it right, but are they using "low" and "high" as descriptors of the number used as the %, meaning that the "low" 10 cuts anything outside the best 10%, while the "high" 90 leaves the best 90% ie high is less selective than low?

carbocation•5mo ago
Thanks, this is a helpful breakdown.
cubefox•5mo ago
This article, like all articles on this substack, is LLM generated.

Source: https://arxiviq.substack.com/p/coming-soon

che_shr_cat•5mo ago
I'm the author of this blog. That's correct, the texts are generated and then validated manually by me.

I also do manual reviews (https://gonzoml.substack.com/), but there are many more papers for which I don't have time to write a review. So I created a multi-agentic system to help me, and I'm constantly iterating to improve it. And I like the result. It was also validated by the paper authors a couple of times, they agree the reviews are correct. So, if you see something is definitely wrong, please let me know.

Regarding myself, I became at least x10 more productive in reading papers and understanding what's happening. Hope, it will also help some of you.

vpribish•5mo ago
hmm. the manual curation and format normalizing could be adding value, but it looks veeery close to a stolen content farm. I like the disclosure so i'll give them the benefit of the doubt until they start pushing clickbait.
GistNoesis•5mo ago
It looks like a variant of "beam search" (using top-k instead of top-1), but it's not mentioned anywhere. What am I not getting ?
klintcho•5mo ago
I was thinking the exact same thing. If someone could explain why it's not just beam search, I would be grateful!
CMay•5mo ago
A consensus of confidence and self-consistency through consensus seem fine for certain kinds of tasks, especially ones that involve recalling training. This is a bit like scaling up determinism, obedience, collectivism vs individualism... which seems fine for many math problems. Neither confidence or consensus are the best way to confirm truth or accuracy in a generalized way.

The previous self-consistency approach and this confidence pruning approach aren't really novel, but it's nice to see the numbers run. Fundamentally these approaches are about handling contradicting results, but not resolving the contradictions or increasing the quality of reasoning. What if the rare idea is the right answer? You can squeeze the training juice harder, but if you still get the wrong answer when it really really mattered, you're just left with a stress toy in your hand.

evertedsphere•5mo ago
yet again i am asking for a mandatory "(LLM output)" label in the title like we do for pdf/video links
mdp2021•5mo ago
One question about how the proposed feature works: do I understand correctly, the NN remains the same but the code that handles it does the trick?

And we will be supposed to find said code at https://jiaweizzhao.github.io/deepconf at some point?

starchild3001•5mo ago
I love this direction of research.

Reducing costs of reasoning is a huge ongoing challenge in LLMs. We're spending so much energy and compute resources today on reasoning that today's consumption rates were unexpected (to me) a short 1 yr ago. We're literally burning forests, the atmosphere and making electricity expensive for everyone.

DeepThink v3.1 made a significant leap in this direction recently -- significantly shorter thinking tokens at the same quality. GPT5's router was also one (important) attempt to reduce reasoning costs and make o3-quality available in the free tier without breaking the bank. This is also why Claude 4 is winning the coding wars against its reasoning peers -- it provides great quality without all the added reasoning tokens.

Getting inspiration from Alpha-go and MCMC literature -- applying tree weighting, prioritization and pruning feels extremely appropriate. (To improve the quality of Deep Think -- offered by Gemini & GPT5 Pro today)

So, yes, more of this please. Totally the right direction.