frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

MiniMind: End-to-end GPT-style LLM training pipeline in pure PyTorch

https://github.com/jingyaogong/minimind
1•dmonterocrespo•2h ago

Comments

dmonterocrespo•2h ago
Hi HN,

I recently spent some time going through MiniMind, and it’s a remarkably clean resource for understanding the modern LLM stack under the hood. It’s a minimal, end-to-end implementation of a ~25M parameter GPT-style model in pure PyTorch, designed to be trained from scratch on a single GPU

Instead of heavy abstractions, it uses straightforward PyTorch while still implementing modern architectural choices like RMSNorm, SwiGLU, RoPE, and even MoE variants. What makes it valuable is that it doesn't stop at the forward pass; the repo covers the entire training lifecycle. You can trace the data flow from tokenizer training and pretraining, right through to Supervised Fine-Tuning (SFT), LoRA, preference optimization (DPO/PPO), and distillation

It’s small enough to actually read the source code end-to-end, but realistic enough to serve as a baseline for architectural experiments rather than just a toy example.

Curious if anyone here has used this (or similar minimal codebases) to test custom architecture modifications or train highly specialized small-scale models

I'm currently testing the pipeline locally on a PC with an RTX 4060, and it's a great fit for this kind of hardware

Rye

https://ryelang.org/
1•tosh•52s ago•0 comments

Palantir turns poisonous on the campaign trail

https://www.ft.com/content/5d6f924d-2e7e-4a5e-ae20-d4f8e29a7d17
1•KnuthIsGod•56s ago•0 comments

SpaceStarCarz KoolWheelz Paper Models

https://davesdesigns.ca/dcc/html/spacestarcarz_.html
1•exvi•1m ago•0 comments

Misfits wanted: the VC firm looking to back 'unreasonable' founders

https://www.ft.com/content/4d29c556-bbd9-490e-a3c8-90f5b894af9e
1•petethomas•4m ago•0 comments

Never Miss a Downtime Again

https://www.notifly.live/
1•netaneo•4m ago•0 comments

Browser control and computer use as MCP tools – works with Claude, Codex, Cursor

https://github.com/gettalon/talon-plugins
1•gettalon•12m ago•0 comments

Ask HN: What would it take to provide free AI to the underprivileged?

1•herodoturtle•12m ago•0 comments

We can remove strncpy() from the Linux kernel finally

https://hachyderm.io/@kees/116282745861595200
1•riffraff•14m ago•1 comments

Amazon confirms: Public wish lists can reveal addresses

https://www.heise.de/en/news/Amazon-confirms-Public-wish-lists-can-reveal-addresses-11221681.html
1•doener•22m ago•0 comments

systemd has not implemented age verification

https://blog.bofh.it/debian/id_473
2•pabs3•26m ago•0 comments

Claude Code Now Supports CIMD for MCP OAuth

https://bsky.app/profile/den.dev/post/3mhrupyeus223
1•mooreds•26m ago•0 comments

The great Linux file pickers tragedy

https://erika.florist/wiki/linux/filepickertragedy/
1•pabs3•27m ago•0 comments

Ask HN: HSL 0.1 – The Human Source License. Please help refining

1•xdgrulez•27m ago•1 comments

The Voice Web with maplibre-voice – Mistral Hackathon 2026

https://www.youtube.com/watch?v=DNpdRVZ0j5A
1•tderflinger•28m ago•0 comments

Expert Personas Improve LLM Alignment but Damage Accuracy

https://arxiv.org/abs/2603.18507
1•Jacques2Marais•28m ago•0 comments

The 53-Year Evolution of AI Agents: A Comprehensive Reading List

https://fullhoffman.com/2026/03/12/agents-are-agents-reading-list/
1•adunk•31m ago•0 comments

Starlette 1.0.0

https://github.com/Kludex/starlette
3•tosh•38m ago•0 comments

Kentucky family rejects $$26M offer to convert farm into data center

https://twitter.com/CollinRugg/status/2036237284601913674
1•gurjeet•38m ago•1 comments

From zero to a RAG system: successes and failures

https://en.andros.dev/blog/aa31d744/from-zero-to-a-rag-system-successes-and-failures/
1•andros•40m ago•0 comments

The Why and What of the CIDR Report

https://www.potaroo.net/ispcol/2026-03/cidr-report.html
1•caminanteblanco•41m ago•0 comments

Decker File Format

https://beyondloom.com/decker/format.html
1•tosh•42m ago•0 comments

Run Qwen 3.5 Locally with Claude Code

https://gist.github.com/kibotu/a009f00414b7c10fb1c74e603d7838c0
2•lastdong•46m ago•0 comments

Protector4J converts JVM bytecode to private ISA for protection

https://protector4j.com/articles/how-protector4j-works/
1•vlinx•47m ago•0 comments

The Shape of Data

https://www.scattered-thoughts.net/writing/the-shape-of-data/
1•tosh•48m ago•0 comments

Open Source Gave Me Everything Until I Had Nothing Left to Give

https://kennethreitz.org/essays/2026-03-18-open_source_gave_me_everything_until_i_had_nothing_lef...
6•pabs3•53m ago•0 comments

Sneaky Header Blocker Trick

https://www.joshwcomeau.com/css/header-blockers/
1•ibobev•55m ago•0 comments

Interactive LED system for crutches that react to walking motion

https://github.com/JackWetherell/Disco-Crutches
1•jw1294•55m ago•0 comments

Lion-K CCWD: Corrected Cautious Weight Decay and Hyperparameter Transfer

https://jiha-kim.github.io/posts/lion-k-ccwd/
1•ibobev•56m ago•0 comments

Pivoted Query Synthesis

https://buttondown.com/jaffray/archive/pivoted-query-synthesis/
1•ibobev•57m ago•0 comments

Israeli minister calls for annexation of southern Lebanon

https://www.reuters.com/world/middle-east/israeli-minister-calls-annexation-southern-lebanon-2026...
3•abdelhousni•57m ago•0 comments