frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

A delightful Mac app to vibe code beautiful iOS apps

https://milq.ai/hacker-news
1•jdjuwadi•2m ago•1 comments

Show HN: Gemini Station – A local Chrome extension to organize AI chats

https://github.com/rajeshkumarblr/gemini_station
1•rajeshkumar_dev•2m ago•0 comments

Welfare states build financial markets through social policy design

https://theloop.ecpr.eu/its-not-finance-its-your-pensions/
2•kome•6m ago•0 comments

Market orientation and national homicide rates

https://onlinelibrary.wiley.com/doi/10.1111/1745-9125.70023
3•PaulHoule•6m ago•0 comments

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

https://www.cbsnews.com/news/california-death-cap-mushrooms-poisonings-liver-transplants/
1•rolph•7m ago•0 comments

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

https://www.capenews.net/falmouth/obituaries/matthew-a-shulman/article_33af6330-4f52-5f69-a9ff-58...
3•canucker2016•8m ago•1 comments

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

https://github.com/varun369/SuperLocalMemoryV2
1•varunpratap369•9m ago•0 comments

Show HN: Pyrig – One command to set up a production-ready Python project

https://github.com/Winipedia/pyrig
1•Winipedia•11m ago•0 comments

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

https://github.com/AysajanE/moltbook-persistence/blob/main/paper/main.pdf
1•EagleEdge•11m ago•0 comments

C and C++ dependencies: don't dream it, be it

https://nibblestew.blogspot.com/2026/02/c-and-c-dependencies-dont-dream-it-be-it.html
1•ingve•12m ago•0 comments

Show HN: Vbuckets – Infinite virtual S3 buckets

https://github.com/danthegoodman1/vbuckets
1•dangoodmanUT•12m ago•0 comments

Open Molten Claw: Post-Eval as a Service

https://idiallo.com/blog/open-molten-claw
1•watchful_moose•12m ago•0 comments

New York Budget Bill Mandates File Scans for 3D Printers

https://reclaimthenet.org/new-york-3d-printer-law-mandates-firearm-file-blocking
2•bilsbie•13m ago•1 comments

The End of Software as a Business?

https://www.thatwastheweek.com/p/ai-is-growing-up-its-ceos-arent
1•kteare•14m ago•0 comments

Exploring 1,400 reusable skills for AI coding tools

https://ai-devkit.com/skills/
1•hoangnnguyen•15m ago•0 comments

Show HN: A unique twist on Tetris and block puzzle

https://playdropstack.com/
1•lastodyssey•18m ago•0 comments

The logs I never read

https://pydantic.dev/articles/the-logs-i-never-read
1•nojito•20m ago•0 comments

How to use AI with expressive writing without generating AI slop

https://idratherbewriting.com/blog/bakhtin-collapse-ai-expressive-writing
1•cnunciato•21m ago•0 comments

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

https://github.com/choihimchan/linkscope-bpu-uart-analyzer
1•octablock•21m ago•0 comments

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

https://github.com/user19870/cppsp
1•user19870•22m ago•1 comments

The next frontier in weight-loss drugs: one-time gene therapy

https://www.washingtonpost.com/health/2026/01/24/fractyl-glp1-gene-therapy/
2•bookofjoe•25m ago•1 comments

At Age 25, Wikipedia Refuses to Evolve

https://spectrum.ieee.org/wikipedia-at-25
2•asdefghyk•28m ago•4 comments

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

https://reviewreact.com
2•sara_builds•28m ago•1 comments

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

https://zenodo.org/records/18514533
1•DarenWatson•30m ago•0 comments

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

1•laurex•33m ago•0 comments

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

https://github.com/amtiYo/agents
1•amtiyo•34m ago•0 comments

Hello

2•otrebladih•35m ago•1 comments

FSD helped save my father's life during a heart attack

https://twitter.com/JJackBrandt/status/2019852423980875794
3•blacktulip•38m ago•0 comments

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

https://writtte.xyz
1•lasgawe•40m ago•0 comments

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

https://www.youtube.com/watch?v=e9FUdOfp8ME
1•zeristor•42m ago•0 comments
Open in hackernews

LLM architecture comparison

https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
418•mdp2021•6mo ago

Comments

bravesoul2•6mo ago
This is a nice catchup for some who hasn't been keeping up like me
dmezzetti•6mo ago
While all these architectures are innovative and have helped improve either accuracy or speed, the same fundamental problem of generating factual information still exists.

Retrieval Augmented Generation (RAG), Agents and other similar methods help mitigate this. It will be interesting to see if future architectures eventually replace these techniques.

tormeh•6mo ago
To me, the issue seems to be that we're training transformers to predict text, which only forces the model to embed limited amounts of logic. We'd have to find something different to train models on in order for them to stop hallucinating.
lblume•6mo ago
Modern neuroscience suggests that everything the human brain might be doing is basically a kind of predictive processing, i.e. hallucination based on inductive biases. I do not think this is the main bottleneck.
bsenftner•6mo ago
I'm still thinking about how RAG being conceptually simple and easy to implement, why the foundational models have not incorporated it into their base functionality? The lack of that strikes me as a negative point about RAG and it's variants, because if any of them worked, it would be in the models directly and not need to be added afterwards.
bavell•6mo ago
RAG is a prompting technique, how could they possibly incorporate it into the pre training?
maleldil•6mo ago
CoT is a prompting technique too, and it's been incorporated.
bavell•6mo ago
IIUC, CoT is "incorporated" into training by just providing better quality training data which steers the model towards "thinking" more deeply in its responses. But at the end of the day, it's still just regular pre training.

RAG - Retrieval augmented generation - how can the retrieval be done during training? RAG will always remain external to the model. The whole point is that you can augment the model by injecting relevant context into the prompt at inference time, bringing your own proprietary/domain-specific data.

bsenftner•6mo ago
Who says "during training"? RAG could be built into the functionality of the LLM directly - give it the documents you want it to incorporate, and it ingests them as a temp mini-fine tune. That would work just fine.
impossiblefork•6mo ago
These things with <think> and </think> tokens are actually trained using RL, so it's not like GSM8k or something like that where you just train on some reasoning.

It's actually like QuietSTaR but with a focus on a big thought in the beginning and with more sophisticated RL than just REINFORCE (QuietSTaR uses REINFORCE).

bsenftner•6mo ago
The same way developers incorporate it now. Why are you thinking "pre-training", this is a feature of the deployed model: it ingests documents and generates a mini-fine tune right then.
mdp2021•6mo ago
Why would be a proper documents-at-hand based inquiry be «simple».

Information is at paragraph #1234 of book B456; that paragraph acquires special meaning in light of its neighbours, its chapter, the book. Further information is in other paragraphs of other books. You can possibly encode with some "strong" compression information (data), but not insight. The information that a query may point to can be a big cloud of fuzzy concepts. What do you input, how? How big should that input be? "How much" of the past reflection does the Doctor use to build a judgement?

RAG seems simple because it has simpler cases ("What is the main export of Bolivia").

rybosome•6mo ago
Well, even if we assume for a moment that we aren’t talking about non-public data…

Then RAG which serves up knowledge already in the model’s pretraining data is still useful, because it primes the model for the specific context with which you want to engage it. I maybe can see what you are saying, like why can’t the model just do a good job without being re-reminded? But even in that sense, any intelligence, artificial or otherwise, will do better given more context.

And that ignores the reality of data outside the model’s pretraining corpus, like every single business’ internal data.

Mars008•6mo ago
It still makes sense to use external data storage for smaller local models. They just can't hold that much.
Mars008•6mo ago
One problem is in datasets which use RAG. Training foundational model requires a lot of samples, and there aren't many. The only option is to use other models to generate, so called distillation.

BTW, RAG is similar to web search. Models can do it. Web server for RAG can be implemented.

esafak•6mo ago
The models can't tell when they shouldn't extrapolate and simply need more information. Which rules can be generalized and which ones can't. Why shouldn't a method `doWhizBang()` exist if there methods for all sorts of other things?

When I was young, I once beamed that my mother was a good cooker. It made perfect sense based on other verbs, but I did not know that that word was already claimed by machines, and humans were assigned the word cooks. Decades later, I had the pleasure of hearing my child call me a good cooker...

lblume•6mo ago
This made me think – the fact that the underlying rule that nouns e.g. representing activities such as cooking can be formed from the corresponding verb via the suffix -er breaks in this case is just a historical / cultural artifact of languages, a type of completely unnecessary complication from the machines' standpoint. Maybe LLM hallucination might also partially be caused by this exception-based social modelling forced via our training data into all model architectures?
ethan_smith•6mo ago
Some newer architectures like DeepSeek-V2 and Llama 3.1 have actually shown significant factuality improvements through architectural changes alone, including improved attention mechanisms and training objectives specifically targeting hallucination reduction.
Chloebaker•6mo ago
Honestly its crazy to think how far we’ve come since GPT-2 (2019), today comparing LLMs to determine their performance is notoriously challenging and it feels like every 2 weeks a models beats a new benchmark. I’m really glad DeepSeek was mentioned here, bc the key architectural techniques it introduced in V3 that improved its computational efficiency and distinguish it from many other LLMs was really transformational when it came out.
strangescript•6mo ago
The diagrams in this article are amazing if you are somewhere in between a novice and expert. Seeing all of the new models laid out next to each other is fantastic.
webappguy•6mo ago
Would love to see a PT.2 w even what is rumored in top closed source frontier models eg. o5, o3 Pro, o4 or 4.5, Gemini 2.5 Pro, Grok 4 and Claude Opus 4
DeveloperErrata•6mo ago
This was really educational to me, felt at the perfect level of abstraction to learn a lot about the specifics of LLM architecture without the difficulty of parsing the original papers
ajeet•6mo ago
Thank you for taking the time to detail the differences - very educational and easy to read.
krackers•6mo ago
Also related https://epoch.ai/gradient-updates/how-has-deepseek-improved-...

and some sections of https://semianalysis.com/2025/07/11/meta-superintelligence-l...