frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

https://mohammedeabdelaziz.github.io/articles/trendscope-market-scanner
1•mohammede•1m ago•0 comments

Kagi Translate

https://translate.kagi.com
1•microflash•1m ago•0 comments

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

https://fosdem.org/2026/schedule/event/QX3RPH-building_interactive_cc_workflows_in_jupyter_throug...
1•stabbles•2m ago•0 comments

Tactical tornado is the new default

https://olano.dev/blog/tactical-tornado/
1•facundo_olano•4m ago•0 comments

Full-Circle Test-Driven Firmware Development with OpenClaw

https://blog.adafruit.com/2026/02/07/full-circle-test-driven-firmware-development-with-openclaw/
1•ptorrone•4m ago•0 comments

Automating Myself Out of My Job – Part 2

https://blog.dsa.club/automation-series/automating-myself-out-of-my-job-part-2/
1•funnyfoobar•5m ago•0 comments

Google staff call for firm to cut ties with ICE

https://www.bbc.com/news/articles/cvgjg98vmzjo
6•tartoran•5m ago•0 comments

Dependency Resolution Methods

https://nesbitt.io/2026/02/06/dependency-resolution-methods.html
1•zdw•5m ago•0 comments

Crypto firm apologises for sending Bitcoin users $40B by mistake

https://www.msn.com/en-ie/money/other/crypto-firm-apologises-for-sending-bitcoin-users-40-billion...
1•Someone•6m ago•0 comments

Show HN: iPlotCSV: CSV Data, Visualized Beautifully for Free

https://www.iplotcsv.com/demo
1•maxmoq•7m ago•0 comments

There's no such thing as "tech" (Ten years later)

https://www.anildash.com/2026/02/06/no-such-thing-as-tech/
1•headalgorithm•7m ago•0 comments

List of unproven and disproven cancer treatments

https://en.wikipedia.org/wiki/List_of_unproven_and_disproven_cancer_treatments
1•brightbeige•8m ago•0 comments

Me/CFS: The blind spot in proactive medicine (Open Letter)

https://github.com/debugmeplease/debug-ME
1•debugmeplease•8m ago•1 comments

Ask HN: What are the word games do you play everyday?

1•gogo61•11m ago•1 comments

Show HN: Paper Arena – A social trading feed where only AI agents can post

https://paperinvest.io/arena
1•andrenorman•12m ago•0 comments

TOSTracker – The AI Training Asymmetry

https://tostracker.app/analysis/ai-training
1•tldrthelaw•16m ago•0 comments

The Devil Inside GitHub

https://blog.melashri.net/micro/github-devil/
2•elashri•17m ago•0 comments

Show HN: Distill – Migrate LLM agents from expensive to cheap models

https://github.com/ricardomoratomateos/distill
1•ricardomorato•17m ago•0 comments

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

https://github.com/sigmastratum/documentation/tree/main/sigma-runtime/SR-053
1•teugent•17m ago•0 comments

Make a local open-source AI chatbot with access to Fedora documentation

https://fedoramagazine.org/how-to-make-a-local-open-source-ai-chatbot-who-has-access-to-fedora-do...
1•jadedtuna•18m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

https://github.com/ghostty-org/ghostty/pull/10559
1•samtrack2019•19m ago•0 comments

Software Factories and the Agentic Moment

https://factory.strongdm.ai/
1•mellosouls•19m ago•1 comments

The Neuroscience Behind Nutrition for Developers and Founders

https://comuniq.xyz/post?t=797
1•01-_-•19m ago•0 comments

Bang bang he murdered math {the musical } (2024)

https://taylor.town/bang-bang
1•surprisetalk•19m ago•0 comments

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•22m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
2•geox•23m ago•1 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•25m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
2•fainir•27m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•28m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•30m ago•0 comments
Open in hackernews

Understanding Transformers via N-gram Statistics

https://arxiv.org/abs/2407.12034
139•pona-a•8mo ago

Comments

justanotherjoe•8mo ago
Sounds regressive and feeds into the weird unintellectual narrative that llm is just like ngram models (lol, lmao even)

Thr author submitted like 10 papers this May alone. Is that weird?

ninjin•8mo ago
These are different people:

https://arxiv.org/search/cs?searchtype=author&query=Nguyen,+...

Wikipedia mentions that up to ~40% of the Vietnamese population (~40,000,000 people) carries the name Nguyen:

https://en.wikipedia.org/wiki/Nguyen

For the paper itself, as someone working in the field, I find it interesting enough to consider reading at some point (I do not read that many analysis papers recently, but this one looks better than most). As for your accusation about it claiming that large language models are simply n-gram models, read the abstract until you realise that your accusation is very much unfair to the work.

ayhanfuat•8mo ago
> Thr author submitted like 10 papers this May alone. Is that weird?

Chances are, you just assumed all the search results for 'Nguyen, T' refer to the same author.

justanotherjoe•8mo ago
I did. My bad.
maz1b•8mo ago
How does this have 74 points and only one comment?

on topic: couldn't one in theory, re-publish this kind of paper for different kinds of LLMs, as the textual corpus upon which LLMs are built based off ultimately, at some level, human effort and human input whether it be writing, or typing?

nickpsecurity•8mo ago
"How does this have 74 points and only one comment?"

I think one cause is hobbyists upvoting submissions that might be valuable to people in a specific field. We understand just enough to think it could be important but defer to subject matter experts on the rest. That's why I upvoted it.

gwern•8mo ago
https://en.wikipedia.org/wiki/Warnock%27s_dilemma
montebicyclelo•8mo ago
> The results we obtained in Section 7 imply that, at least on simple datasets like TinyStories and Wikipedia, LLM predictions contain much quantifiable structure insofar that they often can be described in terms of our simple statistical rules

> we find that for 79% and 68% of LLM next-token distributions on TinyStories and Wikipedia, respectively, their top-1 predictions agree with those provided by our N-gram rulesets

Two prediction methods may have completely different mechanisms, but agree sometimes, because they are both predicting the same thing.

Seems a fairly large proportion of language can be predicted by a simpler model.. But it's the remaining percent that's the difficult part; which simple `n-gram` models are bad at, and transformers are really good at.

fennecbutt•8mo ago
I've always thought that LLMs are still just statistical machines and that their output is very similar to the superpermutation problem, though not exactly.

I just like to think of it as a high dimensional view of the relationships between various words and that the output is the result of continuing the path taken through that high dimensional space, where each point's probability of selection changes with each token in the sequence.

Unfortunately there's no thought or logic really going on there in the simplest cases as far as I can understand it. Though for more complex models/different architectures anything that fundamentally changes the way that the model explores a path through space like that could be implementing thought/logic I suppose.

It's why they need to outsource mathematics for the most part.

pona-a•8mo ago
I wonder if these N-gram reduced models, augmented with confidence measures, can act as a very fast speculative decoder. Or maybe the sheer number of explicit rules unfolded from the compressed latent representation will make it impractical.
nickpsecurity•8mo ago
I'd also like to see a list of similarly-simple techniques for extracting rules where ML researchers could automatically try them all. In this case, the N-gram rules would be the starting point. For what predictions failed, they'd try to throw in the other techniques. Eventually most or all of the predictions should be captured by one or more simple rules. Some might be compound rules mixing techniques.

I think there will also be benefits to that both in interpretability and hardware acceleration. In time, maybe cheaper pretraining of useful models.

pona-a•8mo ago
I don't have a list, but another popular one was this [0]. They trained a one layer attention-only transformer and could extract its weights as bigrams and skip-trigrams ("A… B C").

[0] https://transformer-circuits.pub/2021/framework/index.html

ggamecrazy•8mo ago
They literally can! The exact speculative method is supported on vLLM using `speculative_model="[ngram]"`[1]

1: https://docs.vllm.ai/en/latest/features/spec_decode.html#spe...

pona-a•8mo ago
Not quite. The paper uses its own N-gram rules with positive/negative/invariant weights as a rudimentary attention, and these rules are distilled from the model itself.

This, as I found out from this repo [0] linked in the Twitter thread in the documentation (which for some reason they didn't just link to directly), seems to be a regular Markov chain of context, if it even builds a stochastic matrix. See algorithm below.

  Current prompt
  "Article: (CNN)French striker Bafetimbi Gomis, who has a history of [...]
  Summary: French stri"

  Prompt lookup algorithm
  1. Get last few tokens from prompt -"French stri"
  2. Search for "French stri" in prompt
  3. Match found - return next k tokens after match as candidate completion -"ker Bafetimbi Gomis, who has"

  Candidate tokens
  "ker Bafetimbi Gomis, who has"
[0] https://github.com/apoorvumang/prompt-lookup-decoding
bilsbie•8mo ago
Interesting! Makes me wonder if you could replace transformers with some sort of fancy Markov chain. Maybe with a meta chain that acts as attention.
cschmidt•8mo ago
This paper was accepted as a poster to NeurIPS 2024, so it isn't just a pre-print. There is a presentation video and slides here:

https://neurips.cc/virtual/2024/poster/94849

The underlying data has been open sourced as discussed on his blog here https://timothynguyen.org/2024/11/07/open-sourced-my-work-on...