frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Agentic Coding Tools – Not Skynet, Not a Stochastic Parrot

https://www.brethorsting.com/blog/2025/07/agentic-coding-tools-not-skynet,-not-a-stochastic-parrot/
1•aaronbrethorst•2m ago•0 comments

Listen to Rfc2119

https://ericwbailey.website/published/you-must-listen-to-rfc-2119/
1•bluGill•3m ago•1 comments

Armin Ronacher on Agentic Coding

https://www.youtube.com/watch?v=nfOVgz_omlU
1•paulsutter•8m ago•0 comments

Super Simple "Hallucination Traps" to detect interview cheaters

3•EliotHerbst•16m ago•0 comments

A customizable and extensible all-purpose diagrams library for Blazor

https://github.com/Blazor-Diagrams/Blazor.Diagrams
1•mountainview•18m ago•0 comments

Coinbase Acquires LiquiFi

https://www.coinbase.com/es-la/blog/Coinbase-acquires-LiquiFi-the-leading-token-management-platform
1•wslh•19m ago•0 comments

Trans-Taiga Road:The farthest you can get from a town on a road in North America

https://www.jamesbayroad.com/ttr/index.html
2•jason_pomerleau•23m ago•0 comments

Checklist Genie App – Last Call for Beta Testers

https://checklistgenie.app
1•alohaplannerapp•23m ago•0 comments

Show HN: I created a privacy respecting ad blocker for apps

https://www.magiclasso.co/insights/app-ad-blocking/
1•bentocorp•25m ago•0 comments

An Analysis of Links from the White House's "Wire" Website

https://blog.jim-nielsen.com/2025/links-from-whgov-wire/
1•OuterVale•32m ago•0 comments

Why are my Product Hunt upvotes delayed

https://www.ceresai.xyz/
1•Mahsanziak9•41m ago•2 comments

Qualcomm's Centriq 2400 and the Falkor Architecture

https://chipsandcheese.com/p/qualcomms-centriq-2400-and-the-falkor
1•brian_herman•41m ago•0 comments

Bridging Shopify and Shipstation on Heroku: A Story of Custom Fulfillment

https://kevinhq.com/shopify-shipstation-heroku-integration/
1•kevinhq•44m ago•0 comments

My official list of post-glitch.com hosting options

https://livelaugh.blog/posts/glitch-alternatives/
1•raybb•46m ago•1 comments

All high value work is deep work, and all motivation is based on belief

https://www.reddit.com/r/ExperiencedDevs/s/qV1w0XeFPw
2•Crier1002•47m ago•0 comments

'There is a problem': Meta users complain of being shut out of their accounts

https://www.bbc.com/news/articles/cvgnp9ykm3xo
4•mikece•48m ago•1 comments

Mount Everest's Trash-Covered Slopes Are Being Cleaned by Drones

https://www.bloomberg.com/news/features/2025-07-03/dji-drones-clean-up-mount-everest-trash-in-record-time-amid-climate-change
2•nharada•50m ago•2 comments

Gaming on a Medical Device [video]

https://www.youtube.com/watch?v=rf-efIZI_Dg
1•JKCalhoun•50m ago•1 comments

Open Source 1.7tb Dataset of What AI Crawlers Are Doing

https://huggingface.co/datasets/lee101/webfiddle-internet-raw-cache-dataset
3•catsanddogsart•57m ago•0 comments

Microsoft will lay off 9k employees, or less than 4% of the company

https://techcrunch.com/2025/07/02/microsoft-will-lay-off-9000-employees-or-less-than-4-of-the-company/
5•mrcsharp•57m ago•2 comments

Whole-genome ancestry of an Old Kingdom Egyptian

https://www.nature.com/articles/s41586-025-09195-5
6•A_D_E_P_T•1h ago•0 comments

NYT to start searching deleted ChatGPT logs after beating OpenAI in court

https://arstechnica.com/tech-policy/2025/07/nyt-to-start-searching-deleted-chatgpt-logs-after-beating-openai-in-court/
6•miles•1h ago•0 comments

AI virtual personality YouTubers, or 'VTubers,' are earning millions

https://www.cnbc.com/2025/07/02/ai-virtual-personality-youtubers-or-vtubers-are-earning-millions.html
3•pseudolus•1h ago•0 comments

US rural communities bearing the brunt of Bitcoin mining

https://www.dw.com/en/us-rural-communities-bearing-the-brunt-of-bitcoin-mining/a-72889383
4•musha68k•1h ago•1 comments

gmailtail: tail -f Your Gmail

https://github.com/c4pt0r/gmailtail
1•c4pt0r•1h ago•0 comments

A Non-Partisan U.S. Military Is Essential

https://time.com/7296041/non-partisan-military-is-essential/
5•herecomethefuzz•1h ago•0 comments

What to build instead of AI agents

https://decodingml.substack.com/p/stop-building-ai-agents
46•giuliomagnifico•1h ago•28 comments

Flint, Michigan replaces most lead pipes 10 years after Michigan water crisis

https://www.nbcnews.com/news/us-news/flint-replaces-lead-pipes-10-years-michigan-water-crisis-rcna216442
5•toomuchtodo•1h ago•2 comments

Nebius emerged from Russia as one of Nvidia's top-performing investments

https://sherwood.news/tech/nebius-nvidia-gpus-ai-startup/
2•gmays•1h ago•0 comments

One Life

https://thisisyouronelife.com/
1•tasshin•1h ago•0 comments
Open in hackernews

Dummy's Guide to Modern LLM Sampling

https://rentry.co/samplers
228•nkko•1mo ago

Comments

antonvs•1mo ago
This is great! “Sampling” covers much more than I expected.
blt•1mo ago
This is pretty interesting. I didn't realize so much manipulation was happening after the initial softmax temperature choice.
minimaxir•1mo ago
It's worth noting that only some of these techniques are configurable in modern LLM API outputs. (usually only temperature/top-p/top-k since other penalties require overhead)
Der_Einzige•1mo ago
Most other penalties don't require much overhead (min_p is basically free).

Most techniques are not made available by API providers because they enable alignment breaking. It's the only explanation for why we are still stuck with only top_p, top_k, and temp of 0-2.

If you want proper sampler settings to be available, your options are oobabooga, sillytavern (dependent on your backend, so vllm backend for example doesn't have top-n sigma yet), or directly running huggingface code. There might be some marginal options here too but in general, sampling innovation is firmly in the hands of open source coomers right now and not in the hands of academics.

thegeomaster•1mo ago
Thank you for your contribution!!!

Does min_p help with non-creative-writing tasks as well, such as maths or coding? Is there any way to improve/tune the performance for these with sampling?

Do you have a recommendation/guide on tuning the sampling parameters with what little the frontier model providers expose?

All I have ever seen is the same old very old advice of "t=0 for determinism, t=1 for creativity", with little treatment of top_k/top_p. But even the temp has become quite opaque nowadays. Is this really the softmax temperature that we're controlling, or some proxy for it? Claude 3.7 Sonnet doesn't allow me to use t != 1.0 for thinking mode, while Gemini 2.5 Pro, also a reasoner, will happily accept it. Does it apply only to the non-thinking tokens at the end?

If I get good results with t=0.0 (for e.g. coding), do I lose some "capability" by keeping it at 0.0?

Thanks in advance.

Der_Einzige•1mo ago
Related to this, our min_p paper was ranked #18 out of 12000 submission at ICLR and got an oral:

https://iclr.cc/virtual/2025/oral/31888

Our poster was popular:

poster: https://iclr.cc/media/PosterPDFs/ICLR%202025/30358.png?t=174...

oral presentation (watch me roast yoshua bengio on this topic and then have him be the first questioner, 2nd speaker starting around 19:30 min mark. My slides for the presentation are there too and really funny.): https://iclr.cc/virtual/2025/session/31936

paper: https://arxiv.org/abs/2407.01082

As one of the min_p authors, I can confirm that Top N sigma is currently the best general purpose sampler by far. Also, temperature can and should be scaled far higher than it is today. Temps of 100 are totally fine with techniques like min_p and top N sigma.

Also, the special case of top_k = 2 with ultra high temperature (one thing authors recommend against near the end) is very interesting in its own right. Doing it leads to spelling errors every ~10th word - but also seems to have a certain creativity to it that's quite interesting.

toxik•1mo ago
Are there any samplers that aren't basically greedy? I.e. actually searches the tree. I realize it's an absolutely insane branching factor and quite expensive to expand nodes at that, but it always seemed odd to me that we don't actually search.
Kubuxu•1mo ago
Beam Search sampling is sometimes getting used
Der_Einzige•1mo ago
Besides beam search and it's variants? (there are many including the little known but awesomely powerful constrained beam search: https://huggingface.co/blog/constrained-beam-search)

Does MBR (minimal bayes risk) sampling count?

Also there was this paper at ICLR which is relevant to this question: https://arxiv.org/abs/2410.03968

This paper basically claims that non-heuristic methods (like beam search) are harmful compared to the heuristic ones.

orbital-decay•1mo ago
One thing not said here is that samplers have no access to model's internal state. It's basic math applied to the output distribution, which technically carries some semantics but you can't decode it without being as smart as the model itself.

Certain samplers described here like repetition penalty or DRY are just like this - the model could repeat itself in a myriad of ways, the only way to prevent all of them is better training, not n-gram search or other classic NLP methods. This is basically trying to plug every hole with a finger. How many fingers do you have?

Hacking the autoregressive process has some some low-hanging fruits like Min-P that can make some improvement and certain nifty tricks possible, but if you're doing it to turn a bad model into a good one, you're doing it wrong.

Der_Einzige•1mo ago
No, it's done to turn an uncreative model into a creative model. This idea that sampling isn't that important or is some violation of the bitter lesson is exactly why I had to call out the whole academic field as having a giant blindspot for this kind of research in our oral presentation at ICLR!

Top n sigma has been around since mid 2024, min_p around since 2023 and we are still waiting for these innovations to be integrated outside of open source stuff (i.e. outside of HF/vllm). It's being done slowly on purpose by API providers because they don't want to deal with the risk of models being "too creative" (also high temp likely breaks their watermarking)

One other thing - making models aware of their own sampling settings is super easy if you just feed it back to the model every token or generation (say, using structured generation). Models can control their own sampling settings and thus "have access to its internal states" with just a tiny bit of extra programming (the model can write that code for you now lol)

orbital-decay•1mo ago
I guess variance is a better word for this. Creativity is a pretty loose term, for example most people will describe R1 as creative in RP/stories for its tendency to derail everything in an unhinged way, but it still lacks variance like every other modern model (kill the reasoning chain and look at logprobs to get what I mean). The bitter lesson is not some threshold and can't be violated, it describes a curve of diminishing returns. As long as you're on the easy part, it's fine.

But the bigger problem is that the concepts are expressed before they're decoded into the output distribution. You can steer them to a degree by hacking the autoregressive transport, but if the model itself learned that this concept corresponds to that particular concept, not a set of concepts (and RL tends to do exactly that), fixing it with sampling is usually hard to impossible, you'll just lose accuracy/make it dumber as you basically force out-of-distribution outputs.

achierius•1mo ago
How is it not a violation of the bitter lesson? You're trying to correct the model after the fact using human logic, where the bitter lesson would want you to just train a better model.

Not that I think that goes against your point -- I think it's rather a problem with the bitter lesson.

Der_Einzige•1mo ago
The primary argument for why it's not a violation is that the heuristic is (almost) free. LLM designed samplers are and will probably be better - but in order to start the recursive self-improvement engine a few free heuristics will be needed.

The bitter lesson critique is that the human designed heuristics were not free, and harmed the notion of "letting the computer figure it out" by slowing down training. High temp sampling is very important for half-way decent synthetic data generation and thus enabling "letting the computer figure it out" for natural language. Better sampling is the only way to make high temperature generations coherent.

NitpickLawyer•1mo ago
> No, it's done to turn an uncreative model into a creative model. This idea that sampling isn't that important or is some violation of the bitter lesson is exactly why I had to call out the whole academic field as having a giant blindspot for this kind of research in our oral presentation at ICLR!

I see this sentiment a lot, there's even people that swear by samplers like XTC (which sounds counter intuitive af) but it's always on "creative" tasks. On math tasks, with a clear correct/incorrect answer, none of the "creative" samplers come on top, not even min_p (except for crazy temperatures, and even there the overall accuracy is still lower than normal temps w/ normal sampling)...

The main problem is that "creativity" is such a subjective measure that it's hard to score properly.

Der_Einzige•1mo ago
I think "crazy" temperatures start around 100, not 2-3 as folks commonly claim in the literature.

You're right in general on this post, but I think you underestimate how many coomers/erp folks there are and how much they use LLMs. XTC was made for them to give some notion of slop removal. It's probably not quite as good at that task as the antislop sampler (from Sam Peach, EQ bench creator) - but I find XTC to be quite good at adding "spice" to outputs.

re: difficulty to measure "creativity" is especially true - especially around the difficulty of scoring it! We have some nitpickers of our own whispering into our ears about this. You don't happen to be at Stanford do you? IFYKYK...

neuroelectron•1mo ago
The primary concern here (in this guide) seems to be efficiency and preventing complexity explosions.
mdp2021•1mo ago
When the attempt is though to have the LLM output an "idea", not just a "next token", the selection over the logits vector should break that original idea... If the idea is complete, there should be no need to use sampling over the logits.

The sampling, in this framework, should not happen near the output level ("what will the next spoke word be").

minimaxir•1mo ago
LLMs are trained to maximize the probability of correct guesses for the next token, not "ideas". You cannot define an idea as a training loss objective.
mdp2021•1mo ago
That is an architectural problem. If you want the post to be rephrased: it is paradoxical to have changes made near the output level, "changing words before it says them", given that the expected is to work with ideas. (And even then, selection would not be at the output level - it would be during the definition of the structure.)

So, articles like this submission - while interesting from many points of view - make the elephant in the room more evident.

> You cannot define an idea as a training loss objective

What tells you so? If you see a technical limit, note e.g. that sentences and paragraphs can have their own position in an embedding space.

orbital-decay•1mo ago
Interpretability studies offer several orthogonal ways to look at this, it's like Newtonian vs Lagrangian mechanics. Autoregressive token prediction, pattern matching, idea conceptualization, pathfinding in the extremely multidimensional space...
neuroelectron•1mo ago
Love this and the way everything is mapped out and explained simply really opens up the opportunity for trying new things, and where you can do that effectively.

For instance, why not use whole words as tokens? Make a "robot" with a limited "robot dialect." Yes, no capacity for new words or rare words, but you could modify the training data and input data to translate those words into the existing vocabulary. Now you have a much smaller mapping that's literally robot-like and kind of gives the user an expectation of what kind of answers the robot can answer well, like C-3PO.

minimaxir•1mo ago
> For instance, why not use whole words as tokens?

Word-only tokenizers what people did in the RNN/LSTM days. There's no functional improvement over tokenization schemes like BPE or even WordPiece/SentencePiece, and it results in worse quality since you can't use meaningful semantic hints such as punctuation.

neuroelectron•1mo ago
You can encode semantic hints in the layers instead. Admittedly, this is more expensive which is kind to counter of the word-as-tokens idea.
simonw•1mo ago
This is a really useful document - the explanations are very clear and it covers a lot of ground.

Anyone know who wrote it? It's not credited and it's pubished on a free Markdown pastebin.

The section on DRY - "repetition penalties" - was interesting to me. I often want LLMs to deliberately output exact copies of their input. When summarizing a long conversation for example I tend to ask for exact quotes that are most illustrative of the points being made. These are easy to fact check later by searching for them in the source material.

The DRY penalty seems to me that it would run counter to my goal there.

nkko•1mo ago
I didn't realize that it wasn't attributed; it was written by @AlpinDale.
smcleod•1mo ago
I had a go at writing a bit of a sampling guide for Ollama/llama.cpp as well recently, open to any feedback / corrections - https://smcleod.net/2025/04/comprehensive-guide-to-llm-sampl...
ltbarcly3•1mo ago
Calling things modern that are updates to techniques to use technologies only invented a few years ago is borderline illiterate. Modern vs what, classical LLM sampling?
eddyzh•1mo ago
LLM are way older. The Nobel prize for it shows how they made many of the breakthroughs decades ago ChatGTP was the popular breakthrough. Even then your Smartphone keyboard has been using an LLM for a decade.
Der_Einzige•1mo ago
Many of these algorithms were invented in like 2019 (i.e. TFS) or even earlier (temperature)
ltbarcly3•1mo ago
Wow, so unmodern. Classic even.
antonvs•1mo ago
> Calling things modern that are updates to techniques to use technologies only invented a few years ago is borderline illiterate.

If you’re going to make a criticism like that, you might want to check a dictionary first:

> modern, adj. designed and made using the most recent ideas and methods

— https://dictionary.cambridge.org/us/dictionary/english/moder...

That’s exactly what this article is describing. There’s been a lot of development in this space over the last seven years or so, and e.g. GPT 1, 2, and 3 are certainly very outdated at this point, i.e. not modern in the above sense.

amelius•1mo ago
Would it be possible for the LLM model to do the tokenization implicitly? So instead of building a separate tokenizer, you just allow the use of any string of characters, then have a neural network that converts that into tokens, where the weights of that network are trained with the rest of the llm.
kmeisthax•1mo ago
We already do this. Neural networks can't work with tokens directly - they only take real-numbered vectors and they need differentiable input[0]. So you don't give it token 123, 456, etc; you have to turn each token into a "one-hot encoded" vector that's all zeroes except in the position indexed by the token ID, which gets set to one.

These one-hot encoded vectors are then fed through a linear layer that encodes the token vector down into the hidden state size of the model. e.g. you might have a token vocabulary of 10-100k but a hidden state size of 0.5-2k. Everything else in the model works in hidden state space[1], which has all sorts of higher-level concepts in it.

Now, if we were to remove tokenization, then the encoder needs to do more work in order to get to the same hidden state space we're used to. It might be able to find a more efficient encoding from unpaired bytes to the hidden space, but that seems unlikely, given that the tokenization most models use is already based on the statistical properties of the training set. If we don't automatically pair "anti" or "ism" into a single token before handing it off to the model, then the attention heads on the lower layers in the model have to do the same work.

Given that we used to train models on character sequences, and then moved to tokenization because it was more efficient, I suspect the trade-off is never going to be worth it.

[0] That is, you can't just give it a list of token IDs, because there's no mathematical meaning to token 123.25, nor any meaning to increasing or decreasing token IDs.

[1] This improves performance but makes interperability harder. Most notably, the hidden space's basis vectors are not directly correlated to words or concepts, instead all the concepts exist on a sort of N-dimensional ring.

amelius•1mo ago
> If we don't automatically pair "anti" or "ism" into a single token before handing it off to the model, then the attention heads on the lower layers in the model have to do the same work.

What I mean is an extra neural network that comes before the input of the llm, which converts characters (or simple 1-hot vectors which correspond to charactes) into tokens (or whatever it is you would call the internal representation of the network). The advantage would be a more unified way of representing the llm, and I guess one downside would be that you'd get a lot of replication in the NN, but perhaps these parts can be merged (have shared weights).

gitroom•1mo ago
Man, there's always way more to this stuff than I first guess. Makes me wonder - you think better sampling really fixes model limits, or is it just kind of patching over deeper problems?
michaelgiba•1mo ago
This is much more thorough, but here is an interactive post covering the related topic constrained sampling I put together a few weeks back:

http://michaelgiba.com/grammar-based/index.html