frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Hands-on course for building RL environments for LLMs

https://github.com/anakin87/llm-rl-environments-lil-course
1•anakin87•1h ago

Comments

anakin87•1h ago
Hi HN, I've been spending some time lately trying to build Reinforcement Learning Environments and training small language models and wanted to share a little course I put together based on my experiments.

Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now with RLVR and GRPO, we can make models learn through trial and error in dynamic environments, which are software artifacts.

But how to effectively build RL environments?

In the repo, I cover:

- Mapping core RL concepts (Agents, Environments) to the LLM domain.

- Using the Verifiers open-source library to construct single-turn, multi-turn, and tool-use environments.

- Hands-on: taking a small language model (LiquidAI's LFM2-2.6B) and turning it into a Tic-Tac-Toe master that beats GPT-5-mini. Build the game Environment, ese it to generate synthetic data for SFT warm-up, then Group-based Reinforcement Learning.

---

Links

Course: https://github.com/anakin87/llm-rl-environments-lil-course

Video walkthrough: https://www.youtube.com/watch?v=71V3fTaUp2Q

Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictacto...

Datasets and Models on HF: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-...

---

I'm fascinated by the idea of building these "little worlds" where LLMs can learn, so I hope it's useful.

Feel free to share opinions...

EU should regulate Big Tech, not banning kids from social media, Estonia says

https://www.politico.eu/article/europe-should-stand-up-to-big-tech-instead-of-imposing-social-med...
2•donohoe•1m ago•0 comments

The Structure of the Puma Computer System [pdf]

https://softwarepreservation.computerhistory.org/SETL/setl/doc/Grishman-Structure_of_Puma-1978.pdf
2•rbanffy•5m ago•0 comments

We replaced user accounts with Lightning payments for identity

https://blog.satsrail.com/payment-as-identity/
2•keymaker_p•6m ago•0 comments

Native Raspberry Pi 3B version of the Oberon System 3

https://github.com/rochus-keller/OberonSystem3Native/releases/tag/2026-04-10
2•HotGarbage•6m ago•0 comments

The Romance of the Gas Station Sign

https://www.theatlantic.com/ideas/2026/04/gas-prices-sign-driving/686759/
2•fortran77•6m ago•0 comments

False Memory Syndrome Foundation

https://en.wikipedia.org/wiki/False_Memory_Syndrome_Foundation
3•irthomasthomas•7m ago•0 comments

Online "Phreak Box" (Enhanced Blue Box)

https://phreaknet.org/bluebox/
2•bookofjoe•7m ago•0 comments

Hart Research March 8, 2024 opinion poll for NBC News: people hate AI

https://web.archive.org/web/20260310175721if_/https://s3.documentcloud.org/documents/27777984/nbc...
3•1vuio0pswjnm7•7m ago•1 comments

Show HN: Audit the browser for malicious extensions removed from Chrome Store

https://chromewebstore.google.com/detail/malext-sentry/bpohikihiogjgmebpnbgnloipjaddibe
2•toborrm9•8m ago•0 comments

Bulldog: A compiler for VLIW architectures [pdf]

https://www.cs.yale.edu/publications/techreports/tr364.pdf
2•rbanffy•9m ago•0 comments

Eagle Press alt software pyPowered to WordPress

https://github.com/CaptainFantasticVibeCoder/EaglePress
2•eagle10ne•11m ago•0 comments

The Lean Programming Language and Theorem Prover

https://leodemoura.github.io/static/etaps2026/
2•azhenley•11m ago•0 comments

Breathing life into my 13 year old Nexus 7 with Codex

https://opuslabs.substack.com/p/breathing-life-into-my-13-year-old
3•opuslabs•12m ago•0 comments

Tor Browser on Android leaks IP in desktop mode

4•shchess•12m ago•1 comments

Show HN: Animated ASCII art in pure SVG

https://github.com/syi0808/asciianimesvg
1•syi0808•14m ago•1 comments

Iran's Other Front: The War over the Internet

https://warontherocks.com/2026/04/irans-other-front-the-war-over-the-internet/
2•thinkingemote•14m ago•1 comments

The Obscure Relation of Appropriateness

https://vincentcarchidi.substack.com/p/the-obscure-relation-of-appropriateness
1•darccio•15m ago•0 comments

Quantization, LoRA, and the 8% Problem Benchmarking Local LLMs for Production AI

https://walsenburgtech.com/blog/quantization-lora-benchmarking-local-llms
3•cowartc•16m ago•0 comments

The System Turning Chinese Tech Companies into Military Suppliers

https://warontherocks.com/2026/04/the-hidden-system-turning-chinese-tech-companies-into-military-...
1•thinkingemote•17m ago•0 comments

The end of Star Trek is now official

https://screenrant.com/star-trek-strange-new-worlds-starfleet-academy-sets-destroyed/
2•emptybits•17m ago•1 comments

Karpathy says developers have 'AI Psychosis.' Everyone else is next

https://thenewstack.io/karpathy-says-developers-have-ai-psychosis-everyone-else-is-next/
3•Brajeshwar•18m ago•0 comments

Opinion poll indicates people hate AI

https://fortune.com/2026/03/09/ai-opinion-poll-democrats-iran-war-president-donald-trump/
3•1vuio0pswjnm7•18m ago•0 comments

Put your SSH keys in your TPM chip

https://raymii.org/s/tutorials/Put_your_SSH_keys_in_your_TPM_chip.html
1•birdculture•18m ago•0 comments

Show HN: Recursive-Mode for Coding Agents

https://recursive-mode.dev/introduction
3•try-working•19m ago•0 comments

No-Prompt AI intelligence workspace, save tokens and build better MVP's

https://www.launchchair.io
1•jacobcounsell•21m ago•1 comments

Long Instruction Word architectures and the ELI-512

https://dl.acm.org/doi/10.1145/800046.801649
2•rbanffy•27m ago•0 comments

"The Terminal Prophet" by Claude Code. An ePub Short Story

https://gist.github.com/henrik/401e6142f25e9dce309259384b59ea8e
1•henrikn•27m ago•1 comments

Show HN: Chaaga – A tiny platform to ship and share small web apps faster

https://www.chaaga.com
1•jscar•28m ago•0 comments

If not LLMs, what should I work on?

https://kindxiaoming.github.io/blog/2026/everything-is-language/
1•iamwil•28m ago•0 comments

The Fast and the Furious – Interactive Storyline and Filming Locations

https://fatf-timeline.vercel.app/
1•jfigure•30m ago•0 comments