frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Linux from Scratch

https://www.linuxfromscratch.org/lfs/view/stable/
147•Alupis•1h ago•28 comments

Show HN: ChartGPU – WebGPU-powered charting library (1M points at 60fps)

https://github.com/ChartGPU/ChartGPU
349•huntergemmer•5h ago•117 comments

TeraWave Satellite Communications Network

https://www.blueorigin.com/terawave
41•T-A•1h ago•9 comments

Show HN: Rails UI

https://railsui.com/
45•justalever•1h ago•36 comments

PicoPCMCIA – a PCMCIA development board for retro-computing enthusiasts

https://www.yyzkevin.com/picopcmcia/
79•rbanffy•3h ago•20 comments

Claude's New Constitution

https://www.anthropic.com/news/claude-new-constitution
113•meetpateltech•4h ago•59 comments

Waiting for dawn in search: Search index, Google rulings and impact on Kagi

https://blog.kagi.com/waiting-dawn-search
120•josephwegner•2h ago•74 comments

Skip Is Now Free and Open Source

https://skip.dev/blog/skip-is-free/
140•dayanruben•4h ago•42 comments

The WebRacket language is a subset of Racket that compiles to WebAssembly

https://github.com/soegaard/webracket
19•mfru•3d ago•1 comments

JPEG XL Test Page

https://tildeweb.nl/~michiel/jxl/
113•roywashere•3h ago•83 comments

Stanford scientists found a way to regrow cartilage and stop arthritis

https://www.sciencedaily.com/releases/2026/01/260120000333.htm
111•saikatsg•2h ago•28 comments

Autonomous (YC F25) is hiring – AI-native financial advisor at 0% advisory fees

https://atg.science/
1•dkobran•3h ago

Tell HN: Bending Spoons laid off almost everybody at Vimeo yesterday

205•Daemon404•3h ago•134 comments

Beowulf's opening "What" is no interjection

https://www.poetryfoundation.org/poetry-news/69208/new-research-opening-line-of-beowulf-is-not-wh...
43•gsf_emergency_6•2d ago•31 comments

Nested Code Fences in Markdown

https://susam.net/nested-code-fences.html
150•todsacerdoti•7h ago•44 comments

Can you slim macOS down?

https://eclecticlight.co/2026/01/21/can-you-slim-macos-down/
102•ingve•12h ago•143 comments

SmartOS

https://docs.smartos.org/
133•ofrzeta•4h ago•55 comments

How are you automating your coding work?

19•manthangupta109•56m ago•14 comments

Slouching Towards Bethlehem – Joan Didion (1967)

https://www.saturdayeveningpost.com/2017/06/didion/
11•jxmorris12•2h ago•0 comments

Show HN: Company hiring trends and insights from job postings

https://jobswithgpt.com/company-profiles/
12•sp1982•2h ago•1 comments

RTS for Agents

https://www.getagentcraft.com/
87•summoned•5d ago•37 comments

EU–INC – A new pan-European legal entity

https://www.eu-inc.org/
630•tilt•9h ago•599 comments

Without benchmarking LLMs, you're likely overpaying

https://karllorey.com/posts/without-benchmarking-llms-youre-overpaying
102•lorey•1d ago•61 comments

EmuDevz: A game about developing emulators

https://afska.github.io/emudevz/
167•ingve•3d ago•36 comments

Show HN: See the carbon impact of your cloud as you code

https://dashboard.infracost.io/
37•hkh•5h ago•10 comments

I Made Zig Compute 33M Satellite Positions in 3 Seconds. No GPU Required

https://atempleton.bearblog.dev/i-made-zig-compute-33-million-satellite-positions-in-3-seconds-no...
112•signa11•10h ago•13 comments

Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks

https://elliotarledge.com/blog/batmobile
76•ipnon•3d ago•11 comments

Show HN: yolo-cage – AI coding agents that can't exfiltrate secrets

https://github.com/borenstein/yolo-cage
35•borenstein•4h ago•53 comments

Ireland wants to give its cops spyware, ability to crack encrypted messages

https://www.theregister.com/2026/01/21/ireland_wants_to_give_police/
178•jjgreen•6h ago•76 comments

RSS.Social – the latest and best from small sites across the web

https://rss.social/
210•Curiositry•17h ago•49 comments
Open in hackernews

GenAI, the snake eating its own tail

https://www.ybrikman.com/blog/2026/01/21/gen-ai-snake-eating-its-own-tail/
47•brikis98•1h ago

Comments

mrcwinn•1h ago
Pay per crawl of StackOverflow wouldn't encourage me to post more on StackOverflow. (Not that I was anyway.) Presumably you'd need to pay content creators, but that seems quite inefficient:

1. I pay OpenAI 2. OpenAI rev shares to StackOverflow 3. StackOverflow mostly keeps that money, but shares some with me for posting 4. I get some money back to help pay OpenAI?

This is nonsense. And if the frontier labs are right about simulated data, as Tesla seems to have been right with its FSD simulated visualization stack, does this really matter anyway? The value I get from an LLM far exceeds anything I have ever received from SO or an O'Reilly book (as much as I genuinely enjoy them collecting dust on a shelf).

If the argument is "fairness," I can sympathize but then shrug. If the argument is sustainability of training, I'm skeptical we need these payment models. And if the argument is about total value creation, I just don't buy it at all.

lbrito•1h ago
>If the argument is sustainability of training, I'm skeptical we need these payment models.

That seems to be the argument: LLM adoption leads to drop of organic training data, leading LLMs to eventually plateau, and we'll be left without the user-generated content we relied on for a while (like SO) and with subpar LLM. That's what I'm getting from the article anyway.

mapontosevenths•1h ago
The article gets the part about organic data dying off right. Look at Google SERP's for an example. Almost nobody clicks through to the source anymore, so ad revenue is drying up for them and people are publishing less or publishing in places that pay them directly and live behind a paywall like Medium. Which means Google has less data to work with.

That said, what it misses is that the AI prompts themselves become a giant source of data. None of these companies are promising not to use your data, and even if you don't opt-in the person you sent the document/email/whatever to will because they want it paraphrased or need help understanding it.

lbrito•1h ago
>AI prompts themselves become a giant source of data.

Good point, but can it match the old organic data? I'm skeptical. For one, the LLM environment lacks any truth or consensus mechanism that the old SO-like sites had. 100s of users might have discussed the same/similar technical problem with an LLM, but there's no way (afaik) for the AI to promote good content and demote bad ones, as it (AI) doesn't have the concept of correctness/truth. Also, the old sites were two-sided, with humans asking _and_ answering questions, while they are only on the asking side with AI.

cthalupa•1h ago
> 100s of users might have discussed the same/similar technical problem with an LLM, but there's no way (afaik) for the AI to promote good content and demote bad ones, as it (AI) doesn't have the concept of correctness/truth

The LLM doesn't but reinforcement does. If someone keeps asking the model how to fix the problem after being given an answer, the answer is likely wrong. If someone deletes the chat after getting the answer, it was probably right.

_DeadFred_•59m ago
AI is an entropy machine.

Those AI prompts that become data for the AI companies is yet another thing that the human creators used to understand what people wanted, topics to explore, feedback on what they hadn't communicated well enough. That 'value' is AI stealing yet more energy from the system resulting in even less/less valuable human creation.

TeMPOraL•1h ago
There are so many things wrong with the points this article repeats, but those are soundbites at this point so I'm not sure one can even argue against them anymore.

Still, for the one about organic data (or "pre-war steel") drying out, it's not a threat to model development at all. People repeating this point don't realize that we already have way more data than we need. We got to where we are by brute-forcing the problem - throwing more data at a simple training process. If new "pristine" data were to stop flowing now, we still a) have decent pre-trained base models, and a dataset that's more than sufficient to train more of them, and b) lots of low-hanging fruits to pick in training approaches, architectures and data curation, that will allow to get more performance out of same base data.

That, and the fact that synthetic data turned out to be quite effective after all, especially in the latter phases of training. No surprise there, for many classes of problems this is how we learn as well. Anyone who has experience studying math for maturity exam / university entry exams knows this: the best way to learn is to solve lots of variations of the same set of problems. These variations are all synthetic data, until recently generated by hand, but even their trivial nature doesn't make them less effective at teaching.

zzzeek•1h ago
> If the argument is sustainability of training,

that is the argument, yes.

Claude clearly got an enormous amount of its content from Stackoverflow. Which has mostly ceased to be a source of new content. However unlike the author I dont see any way to fix this; stackoverflow was only there because people had technical questions that needed answers.

Maybe if the LLMs do indeed start going stale as there's not enough training data for new technologies, Q&A sites like Stackoverflow would still have a place, since people would still resort to asking each other questions rather than LLMs that dont have training data for a newer technology.

znsksjjs•1h ago
We’ve seen decades of growing wage gaps and erosion of labors strength. The current elites don’t really care to enrich the people. Why would they care to do anything about this problem? They likely don’t see it as a problem at all.

If they did actually stumble on AGI (assuming it didn’t eat them too) it would be used by a select few to enslave or remove the rest of us.

alfalfasprout•1h ago
Not sure why this is being downvoted. It's spot on. You see folks like Dario et al. raising the alarm bells about what they claim is coming... while working as hard as they can to bring that gloomy future to fruition.

No one in power is going to help unless there's money in it.

thatguy0900•1h ago
You can also see all of these people building survival bunkers.
lbrito•1h ago
Its being downvoted because HN has a very active billionaire-techbro-fanbase.

Also who's this Dario?

mapontosevenths•1h ago
It's being downvoted because it's a ridiculous premise. "The Elites" are human too. This attitude is nonsensical and child-like. Nobody is out here trying to round up the hippies and force them to live in some kind of pods to be harvested for their nutrients or whatever.

This technology, like every prior technology, will cause some people to lose their jobs and some new jobs to be created. This will annoy people who have to learn new skill instead of coasting until retirement as they planned.

It is no different than the buggy whip manufacturers being annoyed at Henry Ford. They were right that it was bad for their industry, but wrong about it being the death of... well all the million things they claimed it would be the death of.

iwontberude•55m ago
And just like Henry Ford and the automobile, one of many externalities was the destruction of black communities: white flight that drained wealth, eminent domain for highways, and increased asthma incidence and other disease from concentrated pollution.
mapontosevenths•48m ago
Yet, overall it was a net positive for society... as almost every technological innovation in history has been.

Did you know the 2/3rds of the people alive today wouldn't be if it hadn't been for the invention of the Haber-bosch process? Technology isn't just a toy, it's our life support mechanism. The only way our population gets to keep growing is if our technology continues to improve.

Will there be some unintended consequences? Absolutely. Does that mean we can (or even should) stop it? Hell no. Being pro-human requires you to be pro-technology.

discreteevent•24m ago
Henry Ford didn't make his cars out of buggy whips. He made a new industry. He didn't cannibalize an existing one. You cannot make an LLM without digesting the source material.
iwontberude•1h ago
It's because people rub shoulders with tech billionaires and they seem normal enough (e.g. kind to wait staff, friends and family). The billionaires, like anyone, protect their immediate relationships to insulate the air of normality and good health they experience personally. Those people who interact with billionaires then bristle at our dissonant point of view when we point at the externalities. Externalities that have been hand waved in the name of modernity.

Sycophancy is for more than just LLMs.

iwontberude•1h ago
According to Trump, "If it was up to Stephen [Miller], there would only be 100 million people in this country — and all of them would look like him."
furyofantares•1h ago
The article feels very confused to me.

Example 1 is bad, StackOverflow had clearly plateaued and was well into the downward freefall by the time ChatGPT was released.

Example 2 is apparently "open source" but it's actually just Tailwind which unfortunately had a very susceptible business model.

And I don't really think the framing here that it's eating its own tail makes sense.

It's also confusing to me why they're trying to solve the problem of it eating its own tail - there's a LOT of money being poured into the AI companies. They can try to solve that problem.

What I mean is - a snake eating its own tail is bad for the snake. It will kill it. But in this case the tail is something we humans valued and don't want eaten, regardless of the health of the snake. And the snake will probably find a way to become independent of the tail after it ate it, rather than die, which sucks for us if we valued the stuff the tail was made of, and of course makes the analogy totally nonsensical.

The actual solutions suggested here are not related to it eating its own tail anyway. They're related to the sentiment that the greed of AI companies needs to be reeled in, they need to give back, and we need solutions to the fact that we're getting spammed with slop.

I guess the last part is the part that ties into it "eating its own tail", but really, why frame it that way? Framing it that way means it's a problem for AI companies. Let's be honest and say it's a problem for us and we want it solved for our own reasons.

npinsker•1h ago
“Well, Reddit is growing, which contradicts my point, but I really feel like it’s not”
mapontosevenths•1h ago
Reddit is growing because they introduced automatic machine translation and Indians have been joining at an increasing rate. That content is mixed into the English language content, but is of very low quality and irrelevant to many native English speakers. Similarly they mix the English content in with the Indian content.

Essentially, Reddit is also eating it's own tail to survive as the flood of low quality irrelevant content is making the platform worse for speakers of all languages but nobody cares because "line go up."

semiquaver•1h ago
The proposed solution is also pretty confused:

  > For each response, the GenAI tool lists the sources from which it extracted that content, perhaps formatted as a list of links back to the content creators, sorted by relevance, similar to a search engine
This literally isn’t possible given the architecture of transformer models and there’s no indication it will ever be.
busymom0•1h ago
Could you ELI5 why this isn't possible? Google's search result AI summary shows the links for example.
cthalupa•59m ago
Those citations come from it searching the web and summarizing, not from it's built in training data. Processes outside of the inference are tracking it.

If it were to give you a model-only response it could not determine where the information in it was sourced from.

Terr_•55m ago
OK, I'll try to err towards the "5" with this one.

1. We built a machine that takes a bunch of words on a piece of paper, and suggests what words fit next.

2. A lot of people are using it to make stories, where you fill in "User says 'X'", and then the machine adds something like "Bot says 'Y'". You aren't shown the whole thing, a program finds the Y part and sends it to your computer screen.

3. Suppose the story ends, unfinished, with "User says 'Why did the chicken cross the road?'". We can use the machine to fix up the end, and it suggests "Bot says: 'To get to the other side!'"

4. Funny! But User character asks where the answer came from, the machine doesn't have a brain to think "Oh, wait that means ME!". Instead, it keeps making things longer in the same way as before, so that you'll see "words that fit" instead of words that are true. The true answer is something unsatisfying, like "it fit the math best".

5. This means there's no difference between "Bot says 'From the April Newsletter of Jokes Monthly'" versus "Bot says 'I don't feel like answering.'" Both are made-up the same way.

> Google's search result AI summary shows the links for example.

That's not the LLM/mad-libs program answering what data flowed into it during training, that's the LLM generating document text like "Bot runs do_web_search(XYZ) and displays the results." A regular normal program is looking for "Bot runs", snips out that text, does a regular web search right away, and then substitutes the results back inside.

furyofantares•43m ago
Any LLM output is a combination of its weights from its training, and its context. Every token is some combination of those two things. The part that is coming from the weights is the part that has no technical means to trace back to its sources.

But even the part that is coming from the context is only being produced by the weights. As I said, every token is some mathematical combination of the weights and the context.

So it can produce text that does not correctly summarize the content in its context, on incorrectly reproduce the link, or incorrectly map the link to the part of its context that came from that link, or more generally just make shit up.

logifail•1h ago
> They can try to solve that problem

Well, they could always try actually paying content creators. Unlike - for instance - StackOverflow.

shagie•1h ago
StackOverflow as built back in the days of Web 2.0 where the idea was that user generated content formed in the days of the (relatively) altruistic web.

There isn't any clean way to do "contributor gets paid" without adding in an entire mess of "ok, where is the money coming from? Paywalls? Advertising? Subscriptions?" and then also get into the mess of international money transfers (how do you pay someone in Iran from the US?)

And then add in the "ok, now the company is holding payment information of everyone(?) ..." and data breaches and account hacking is now so much more of an issue.

Once you add money to it, the financial inceptives and gamification collide to make it simply awful.

jaredcwhite•1h ago
“We can’t put the genie back in the bottle.”

Actually we can. And we will.

semiquaver•1h ago
How?
happytoexplain•1h ago
Law or war. Not saying it would happen.
mapontosevenths•1h ago
It was presented without explanation and can be ignored without explanation.
jaredcwhite•1h ago
You need an explanation of how people make norms & laws regarding what is acceptable or unacceptable in society and industry?
mapontosevenths•1h ago
No such claim was made, therefore no such claim needs to be refuted. If people want to engage in conversation they will have to use their words to do it.
thatguy0900•1h ago
Only way I could see it is if there's enough pushback on them taking everyone's power and water (and computer parts) in a world where power and water are becoming increasingly unstable. But I feel like defeating Ai because there is not enough consistent water and power to give them means there is more pressing issues at hand...
atomic128•1h ago
Poison Fountain: https://rnsaffn.com/poison2/

https://www.theregister.com/2026/01/11/industry_insiders_see...

alfalfasprout•1h ago
Agreed, it's funny how people have taken unrestrained use of AI as an axiom at this point. There very much is still time to significantly control it + regulate it. Is there enough appetite by those in power (across the political spectrum)? Right now I don't think so.
lbrito•1h ago
>There very much is still time to significantly control it + regulate it.

There's also huge financial momentum shoving AI through the world's throat. Even if AI was proven to be a failure today, it would still be pushed for many years because of the momentum.

I just don't see how that can be reversed.

locusofself•1h ago
I feel like the only solution to the problem is democratized RLHF, where whenever we get a bad answer from an LLM, we can immediately tell it what was wrong and it can learn from that.
schmichael•1h ago
If you're paying to use the model that means instead of paying content creators you're also now giving more content to the model for free.

Also just like SEO to game search engines, "democratized RLHF" has big trust issues.

aeon_ai•1h ago
GenAI changes the dynamics of information systems so fundamentally that our entire notion of intellectual property is being upended.

Copyright was predicated on the notion that ideas and styles can not be protected, but that explicit expressive works can. For example, a recipe can't be protected, but the story you wrap around it that tells how your grandma used to make it would be.

LLMs are particularly challenging to wrangle with because they perform language alchemy. They can (and do) re-express the core ideas, styles, themes, etc. without violating copyright.

People deem this 'theft' and 'stealing' because they are trying to reconcile the myth of intellectual property with reality, and are also simultaneously sensing the economic ladder being pulled up by elites who are watching and gaming the geopolitical world disorder.

There will be a new system of value capture that content creators need to position for, which is to be seen as a more valuable source of high quality materials than an LLM, serving a specific market, and effectively acquiring attention to owned properties and products.

It will not be pay-per-crawl. Or pay-per-use. It will be an attention game, just like everything in the modern economy.

Attention is the only way you can monetize information.

bitwize•42m ago
No. The idea-expression dichotomy is a common myth about copyright law, right up there with "if I already own the physical cartridge, downloading this game ROM is OK".

The ONLY things that matter when determining whether copyright was infringed are "access" and "substantial similarity". The first refers to whether the alleged infringer did, or had a reasonable opportunity to, view the copyrighted work. The second is more vague and open-ended. But if these two, alone, can be established in court, then absent a fair use or other defense (for example, all of the ways in which your work is "substantially similar" to the infringed work are public domain), you are infringing. Period. End of story.

The Tetris Company, for example, owns the idea of falling-tetromino puzzle video games. If you develop and release such a game, they will sue you and they will win. They have won in the past and they can retain Boies-tier lawyers to litigate a small crater where you once stood if need be. In fact, the ruling in the Tetris vs. Xio case means that look-and-feel copyrights, thought dead after Apple v. Microsoft and Lotus v. Borland, are now back on the table.

It's not like this is even terribly new. Atari, license holders to Pac-Man on game consoles at the time, sued Philips over the release of K.C. Munchkin! on their rival console, the Magnavox Odyssey 2. Munchkin didn't look like Pac-Man. The monsters didn't look like the ghosts from Pac-Man. The mazes and some of the game mechanics were significantly different. Yet, the judge ruled that because it featured an "eater" who ate dots and avoided enemies in a maze, and sometimes had the opportunity to eat the enemies, K.C. Munchkin! infringed on the copyrights to Pac-Man. The ideas used in Pac-Man were novel enough to be eligible for copyright protection.

cthalupa•1h ago
This is an article that I agreed with more reading the headline than I did when I finished reading the article itself.

Stack Overflow peaked in 2014 before beginning it's downward decline. How is that at all related to GenAI? GPT4 is when we really started seeing these things get used to replace SO, etc., and that would be early 2023 - and indeed the drop gets worse there - but after the COVID era spike, SO was already crashing hard.

Tailwind's business model was providing a component library built on top of their framework. It's a business model that relies on the framework being good enough for people to want to use it to begin with, but being bad enough that they'd rather pay for the component library than build it themselves. The more comfortable it is to use, the more productive it is, the worse the value proposition is for the premium upsell. Even other "open core" business models don't have this inherent dichotomy, much less open source on the whole, so it's really weird to try and extrapolate this out.

The thing is, people turn to LLMs to solve problems and answer questions. If they can't turn to the LLM to solve that problem or answer that question, they'll either turn elsewhere, in which case there is still a market for that book or blog post, or they'll drop the problem and question and move on. And if they were willing to drop the problem or question and move on without investigating post-LLM, were they ever invested enough to buy your book, or check more than the first couple of results on google?

worik•50m ago
I feel no nostalgia for Stackoverflow.

I always found it very frustrating that for a person at the start of the learning curve it was "read only"

Actually asking a naive question there was to get horribly flamed on the site. It, and the people using it, were very keen to explain how stupid you were being

LLMs on the other hand are sweet and welcoming (to a fault) of the naive newbie

I have been learning to use Shell script with the help of LLMs, I could not achieve that using SO

Good riddance

nomadygnt•9m ago
I see what you mean, but the problem is that the LLM provider is trying to provide all the value from the book to the user without the user needing to look at the book at all. I agree if the LLM fails to do so then there is a market for the book. But the LLM provider is trying to minimize that as much as possible. And if the LLM succeeds at providing all the value of the book to the user, without providing any value to the book creator, then in the future there is no incentive to create the book at all, at which point the LLM has no value to provide, etc etc etc.
birdiefm•1h ago
ouroboros can have a little ouroboros (as a treat)
jarjoura•43m ago
This is exactly the sentiment I have been trying to articulate myself.

The ONLY reason we are here today is because OpenAI, and Anthropic, by extension, took it upon themselves to launch chat bots trained on whatever datasources they could get in a short amount of time to quickly productize their investments. Their first versions didn't include any references to the source material, and just acted as if they knew everything.

When CoPilot was built as a better auto-complete engine, trained on opensource projects, it was an interesting idea, because it doing what people already did. They searched GitHub for examples of the solution or nudged them in that direction. However, the biggest difference, using other project code was stable, because it came with a LICENSE.md that you then agreed to, and paid it forward. (i.e. "I used code from this project").

CoPilot initially would just inject snippets for you, without you knowing the source. It was only later, they walked that back and if you did use CoPilot, it shows you the most-likely source of the code it used. This is exactly the direction all of the platforms seem headed.

It's not easy to walk back the free-for-all system (i.e. Napster), but I'm optimistic over time it'll become a more fair, pay to access system.

worik•42m ago
The old model of selling eyeballs to advertisers was horrible

I do not know what will replace it, but I will not miss websites trying to monetise my attention

loudmax•18m ago
The GenAI providers will certainly explore advertisement revenue. They're not doing much of it yet because they're trying to gain market share while they figure out what what pain threshold of advertising their users will tolerate.

People today may have a better sense of the downsides of ad-based services than we did when the internet was becoming mainstream. Back then, the minor inconvenience of seeing a few ads seemed worth all the benefits of access all the internet had to offer. And it probably was. But today the public has more experience with the downsides of relentless advertising optimization and audience capture, so there might be more business models based on something other than advertising. Either way, GenAI advertising is certainly coming.