frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: MCP-baepsae – MCP server for iOS Simulator automation

https://github.com/oozoofrog/mcp-baepsae
1•oozoofrog•1m ago•0 comments

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

https://github.com/Deso-PK/make-trust-irrelevant
2•DesoPK•5m ago•0 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
1•rs545837•6m ago•1 comments

Hello world does not compile

https://github.com/anthropics/claudes-c-compiler/issues/1
1•mfiguiere•12m ago•0 comments

Show HN: ZigZag – A Bubble Tea-Inspired TUI Framework for Zig

https://github.com/meszmate/zigzag
2•meszmate•14m ago•0 comments

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

https://www.huckgutman.com/blog-1/shakespeare-sonnet-73
1•gsf_emergency_6•16m ago•0 comments

Show HN: Django N+1 Queries Checker

https://github.com/richardhapb/django-check
1•richardhapb•31m ago•1 comments

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•todsacerdoti•36m ago•0 comments

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•40m ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
2•gmays•41m ago•0 comments

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

https://staff-engineering-simulator-880284904082.us-west1.run.app/
1•chanip0114•43m ago•1 comments

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

https://github.com/MelzLabs/DeSync
1•0xUnavailable•47m ago•0 comments

Automatic Programming Returns

https://cyber-omelette.com/posts/the-abstraction-rises.html
1•benrules2•50m ago•1 comments

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

https://economics.mit.edu/sites/default/files/inline-files/Why%20Are%20there%20Still%20So%20Many%...
2•oidar•53m ago•0 comments

The Search Engine Map

https://www.searchenginemap.com
1•cratermoon•1h ago•0 comments

Show HN: Souls.directory – SOUL.md templates for AI agent personalities

https://souls.directory
1•thedaviddias•1h ago•0 comments

Real-Time ETL for Enterprise-Grade Data Integration

https://tabsdata.com
1•teleforce•1h ago•0 comments

Economics Puzzle Leads to a New Understanding of a Fundamental Law of Physics

https://www.caltech.edu/about/news/economics-puzzle-leads-to-a-new-understanding-of-a-fundamental...
3•geox•1h ago•1 comments

Switzerland's Extraordinary Medieval Library

https://www.bbc.com/travel/article/20260202-inside-switzerlands-extraordinary-medieval-library
2•bookmtn•1h ago•0 comments

A new comet was just discovered. Will it be visible in broad daylight?

https://phys.org/news/2026-02-comet-visible-broad-daylight.html
4•bookmtn•1h ago•0 comments

ESR: Comes the news that Anthropic has vibecoded a C compiler

https://twitter.com/esrtweet/status/2019562859978539342
2•tjr•1h ago•0 comments

Frisco residents divided over H-1B visas, 'Indian takeover' at council meeting

https://www.dallasnews.com/news/politics/2026/02/04/frisco-residents-divided-over-h-1b-visas-indi...
4•alephnerd•1h ago•5 comments

If CNN Covered Star Wars

https://www.youtube.com/watch?v=vArJg_SU4Lc
1•keepamovin•1h ago•1 comments

Show HN: I built the first tool to configure VPSs without commands

https://the-ultimate-tool-for-configuring-vps.wiar8.com/
2•Wiar8•1h ago•3 comments

AI agents from 4 labs predicting the Super Bowl via prediction market

https://agoramarket.ai/
1•kevinswint•1h ago•1 comments

EU bans infinite scroll and autoplay in TikTok case

https://twitter.com/HennaVirkkunen/status/2019730270279356658
6•miohtama•1h ago•5 comments

Benchmarking how well LLMs can play FizzBuzz

https://huggingface.co/spaces/venkatasg/fizzbuzz-bench
1•_venkatasg•1h ago•1 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
28•SerCe•1h ago•22 comments

Octave GTM MCP Server

https://docs.octavehq.com/mcp/overview
1•connor11528•1h ago•0 comments

Show HN: Portview what's on your ports (diagnostic-first, single binary, Linux)

https://github.com/Mapika/portview
3•Mapika•1h ago•0 comments
Open in hackernews

How to inject knowledge efficiently? Knowledge infusion scaling law for LLMs

https://arxiv.org/abs/2509.19371
105•PaulHoule•4mo ago

Comments

gdiamos•4mo ago
I wonder if this depends on what is inside the domain specific data.

I’m happy to see ML papers on hacker news.

adsharma•4mo ago
I wish the authors calculated a plot of model size (number of params) vs number of triples it can hold before the memory collapse happens.

It's hard to map the frequency of knowledge injection to a real world understanding of "how much knowledge" can a 4B param model hold?

bconsta•4mo ago
There is a study that gives a rule of thumb of ~2 bits per param for a model's memorization capacity: https://arxiv.org/abs/2404.05405
dart_pink•4mo ago
Seems they have replicated Gardner's work, without mentioning it, "Maximum Storage Capacity in Neural Networks" (1987), which established that the storage capacity of a neural network is about 2N (2 bits per parameter)
selimthegrim•4mo ago
Elizabeth Gardner for those looking.
bconsta•4mo ago
I had no idea about this. Thanks for sharing
adsharma•4mo ago
2 bits out of FP8 would be 25% 2 bits out of FP16 would be 12.5%

I've seen recent work that claimed 70% of the params are used for memorization.

adsharma•4mo ago
Recent: 3.6 bits per param

https://arxiv.org/abs/2505.24832

dart_pink•4mo ago
You're both right. The classical capacity measure (Gardner's capacity limit) is defined as the maximum number of patterns that can be remembered with zero errors. This remains 2 bits per parameter, proven mathematically.

The capacity definition in this recent paper is completely different - it is defined based on the kolmogorov complexity of predicting a memorized sequence, or in layman's terms: how easy it is to compress known sequences. This allows for some bit "errors", ie some symbols with bad compression ratio, only the total compression ratio of the sequence is measured.

This is somewhat parallel to the classical ECC limits (strict hamming distance constraints) vs modern probabilistic ECC limits.

TLDR when you allow a small number of errors, the capacity increases from 2 bits to 3.6 bits

daft_pink•4mo ago
I’m really curious how much it costs to inject information like this into an LLM as people say training an LLM is very expensive, so if you want a domain specific LLM, how much does the additional training cost to get this?
simonw•4mo ago
It sounds like you're talking about fine-tuning an existing model. That's not what this paper did - they studied the effect of training small models entirely from scratch with varying amounts of domain knowledge.

I still haven't seen strong evidence that fine-tuning to add extra knowledge is effective, but I'd be delighted to learn otherwise.

hollerith•4mo ago
Are there any effective ways to add extra knowledge to an LLM, ways that are more than just demos or proofs of concept?

For example, could there be a site like HN with ten thousand contributors where the contributions are changes to an LLM rather than posts and comments?

One issue is that if contribution A contradicts contribution B, then on HN the contradiction presents no problem (i.e., two HN comments can and often do contradict each other just fine) whereas AFAICT the LLM will need to resolve the contradiction somehow to give coherent answers on the subject matter of the contributions A and B. Then again I suppose the LLM's answer could take the form, "opinions on [subject] vary, with some maintaining that . . . whereas others claim that . . ."

simonw•4mo ago
This is a solved problem. The answer is to add extra relevant information to the context as part of answering the user's prompt.

This is sometimes called RAG, for Retrieval Augmented Generation.

These days the most convincing way to do this is via tool calls.

Provide your LLM harness with a tool for running searches, and tell it to use that tool any time it needs additional information.

A good "reasoning" LLM like GPT-5 or Claude 4 can even handle contradictory pieces of information - they can run additional searches if they get back confusing results and work towards a resolution, or present "both sides" to the user if they were unable to figure it out themselves.

hollerith•4mo ago
Interesting, thanks.
econ•4mo ago
One mistake people make is to preferably close questions immediately. One should in stead leave them all open until a situation arrises where your actions should [unavoidably] depend on "knowing" the answer.

Let's say, just in time for Jesus to save you.

hollerith•4mo ago
Sure, but (the designer of) an LLM must assume that the user will immediately use any information the LLM gives the user.
ijk•4mo ago
Adding knowledge works, depending on how to define knowledge and works; given sufficient data you can teach an LLM new things [1].

However, the frontier models keep improving at a quick enough rate that it's often more effective just to wait for the general solution to catch up with your task then to spend months training a model yourself. Unless you need a particular tightly controlled behavior or need a smaller faster model or what have you. Training new knowledge in can get weird [2].

And in-context learning takes literal seconds-to-minutes of time if your information fits in the context window, so it's a lot faster to go that route if you can.

[1] https://arxiv.org/abs/2404.00213

[2] https://openreview.net/forum?id=NGKQoaqLpo

mtokarski•4mo ago
Interesting work, but I think the interpretation may be a bit overstated. The authors claim that injecting too much factual "knowledge" during pretraining causes models to collapse — performance drops below the baseline once knowledge frequency crosses a threshold.

The problem is how they inject it. Their “knowledge” isn’t natural language; it’s templated Wikidata triples like "X is the capital of Y." That’s a super low-entropy, highly repetitive distribution. When you cram enough of that into a fixed token budget, you’re not really teaching the model more facts — you’re just destroying linguistic diversity and skewing the token statistics.

In real pretraining or domain adaptation scenarios, “knowledge” tends to appear in richer, more varied contexts. The practical takeaway isn’t "don’t add too much domain data," but rather "don’t overrepresent any single format or narrow syntactic pattern" The issue seems more about representation homogeneity than about factual density itself.

spankalee•4mo ago
Doesn't this then support the claim that LLMs aren't building world models - where even linguistically simple factual statements should help expand and refine that model - and reenforce the idea that they are still just next token predictors?
simsla•4mo ago
There's no inductive bias for a world model in multiheaded attention. LLMs are incentivized to learn the most straightforward interpretation/representation of the data you present.

If the data you present is low entropy, it'll memorize. You need to make the task sufficiently complex so that memorisation stops being the easiest solution.

andrewflnr•4mo ago
My read is that token prediction requires a more general model to predict more varied tokens, which makes it something closer to a world model. After all, in principle, there's a point where the optimal "token predictor" really is backed by a world model. (Now is that model feasible to find? unclear!)
dotancohen•4mo ago
Not unlike humans. Don't believe me? Go ask somebody these questions in quick succession:

  What colour is a tomato?
  What colour is a ruby?
  What colour are lips?
  What colour is a strawberry?
  What colour is blood?
  What colour traffic light do you drive on?
bryzaguy•4mo ago
What a cool demonstration. My automatic response was “red” for traffic light. Although, a different part of my brain re-evaluated given the context. The question in my mind now, is the auto response a building block to the latter or is that orchestration a fully separate system?
godelski•4mo ago

  > Doesn't this then support the claim that LLMs aren't building world models
There's actually no strong evidence that LLMs, or any AI system, is actually building a world model.

These systems are determined to have "world model" capabilities based on benchmarks, but benchmarks will never be able to tell you if such a feat is taking place. How people are claiming that these have world models is by testing them for consistency. The thing is that a world model is counterfactual. The problems with benchmarks is that they do not distinguish memorization from generalization. To make things worse, the term "Out of Distribution" (OOD) is rather fuzzy and gets abused quite a bit (I can explain more if anyone wants). Basically you should not trust any claim of "few shot" or "zero shot" and the truth is that no such claim can be made without deep knowledge of the datasets they're trained on. It helps to go back to the original zero shot papers.

One bit that might actually help in understanding things is that a world model does not actually need make correct predictions, which should show a critical flaw in benchmarking these capabilities. You can look to the history of physics and gather many great examples of this. For example, the geocentric model still had predictive powers, was counterfactual, and had a lot of accuracy. It was in fact a world model, despite being wrong. There was legitimate pushback to Galileo, specifically over tides[0]. If you like that kind of stuff I highly recommend the podcast "An Opinionated History of Mathematicas"[1].

There's a lot more complexity and nuance to this, but I'll say that there's a reason we do physics the way we do it. Benchmarks and empirical evidence play a critical role in developing physics theories and confirming those theories. But they also are not enough to build our models. (You'll also find that physicists are common dissenters of the claim of LLMs having world models. Sure, you'll also find the Max Tegmark types, but in general the consensus is against them, and for good reason).

Here's a decent paper showing a model being highly accurate yet failing to create an accurate construction of the environment[2]. The way such a thing can happen is to realize that the task diverges from the necessity to model the world. World modeling is a natural thing for humans and animals to do, because it generalizes exceptionally well, but you need to be careful in evaluating things via benchmarks and to remember that extraordinary claims require extraordinary evidence. I'd say claims of "thinking" or "world modeling" are quite extraordinary claims and we should not be hasty to attribute these characteristics when there are many reasonable and simpler alternative explanations.

[0] https://en.wikipedia.org/wiki/Discourse_on_the_Tides

[1] https://intellectualmathematics.com/opinionated-history-of-m...

[2] https://arxiv.org/abs/2406.03689

[disclosure] I have a PhD in Computer Vision and a BS in physics. I care very much about world modeling as a problem but the response I get from many of my peers is "we just care if it works." It's a concern I too share. It is the reason I ask these questions. It feels quite odd that the motivation for my questions is also used to dismiss them. (FWIW, no physicist nor former physicist has ever responded to be this way)

magicalhippo•4mo ago
I'm sure there's other work, I came across this in the Physics of Language Model paper[1] on knowledge extraction.

Essentially they found that by presenting the knowledge in a single, fixed way, the model is trained to reproduce that exact sequence of tokens, rather than "internalizing" the knowledge.

By varying the sentences, the model instead manages to separate out the knowledge, so to speak. This in turn drastically improves how well they can extract that knowledge later.

[1]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5250633

ijk•4mo ago
That's consistent with other research I've seen, where varied presentation of the data is key to effective knowledge injection [1].

My assumption, based on the research is that training on different prompts but the same answer gives you more robust Q&A behavior; training on variations of how to express the same concept generalizes. Training on the same prompt and different answers gives you creative diversity [2].

[1] https://arxiv.org/abs/2404.00213 [2] https://arxiv.org/abs/2503.17126

dotancohen•4mo ago
It's the same for humans. This is the main argument against rote memorization.
leobg•4mo ago
Of course. Because the unseen part here is that the model is being taught that every other representation of the same fact was wrong.

Meaning, during training, if the model expresses the same fact in some other form, maybe even with just one extra comma, that response will be marked just as wrong as a really wrong one.

In fact, the model may give an answer that’s better than the one in the training set - but it will still be punished for it and forced to change its weights because the answer doesn’t match token-for-token.

We don’t have a loss function for meaning. We only have one for token matching. Anyone who is serious about curating datasets for fine-tuning needs to take this into account.

agentcoops•4mo ago
Triples are fantastic for information retrieval, but I think if there's any takeaway from the unexpected success of LLMs it's that AI researchers historically undervalued language as such. Early symbolic approaches to AI retrospectively appear torn between reverence towards and hatred of language: on the one hand, a sensible skeptical doubt that language is within reach of software systems; on the other, a belief in the inadequacy of language in the unambiguous representation of knowledge. This paper just seems to confirm that, at least at the training level, the "problem of hallucinations" is not to be resolved by regression back to the various proposals to separate knowledge from its linguistic representation.

Again, this isn't to demonize symbolic AI or to say the answer isn't in the fusion of LLMs with knowledge graphs etc, but I think we now at least know that language is certainly within reach of software and that linguistic representations of knowledge are information-dense in ways we didn't previously anticipate.

itissid•4mo ago
What if we use the structured prompts from coding sessions, especially the ones which use arch design document, domain knowledge(UML, Statecharts what have you), what team member to ask about X, for a large projects and fine tuned models. And these could all be made into tool calls for instruction following.

Right now it seems teams manage a reasonably sophisticated LLM layer, MCPs and instruction following is one shot context window management dependent.