frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

GRPO experiment - I trained a Language Model to schedule events

https://github.com/anakin87/qwen-scheduler-grpo
1•anakin87•9mo ago

Comments

anakin87•9mo ago
I experimented with GRPO lately, since I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.

After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game, but I wanted a different challenge.

So I opted for teaching a model to create a schedule from a list of events and priorities.

Choosing an original problem forced me to think about the problem setting, generate data, choose the base model, design reward functions, and run multiple rounds of training, hoping that my model would learn something.

A fun and rewarding experience :-)

I learned a lot of things, that I want to share with you.

---

- Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo

- Code: https://github.com/anakin87/qwen-scheduler-grpo

- Hugging Face collection (dataset and model): https://huggingface.co/collections/anakin87/qwen-scheduler-g...

---

Some hot takes from my experiment

- GRPO is cool for verifiable tasks, but is more about eliciting desired behaviors from the trained model than teaching completely new stuff to it. https://arxiv.org/abs/2504.13837

- Choosing the right base model (and size) matters.

- "Aha moment" might be over-hyped. https://oatllm.notion.site/oat-zero

- Reward functions design is crucial. If your rewards are not robust, you might experience reward hacking (as it happened to me).

- Unsloth is great for saving GPU, but beware of bugs.

Uber held liable, ordered to pay $8.5M in driver rape suit

https://www.cnbc.com/2026/02/06/uber-liable-pay-8-5-million-driver-rape-suit.html
1•gslin•1m ago•0 comments

DayTradingCentral – Free Trading Journal (Next.js, NestJS, Postgres)

https://www.daytradingcentral.com
1•MuZzZ•1m ago•0 comments

Creative problem-solving of unsolved puzzles during REM sleep

https://academic.oup.com/nc/article/2026/1/niaf067/8456489
1•tchalla•8m ago•0 comments

Show HN: Language learning through AI example sentences (onigiri.kr)

https://jpen.onigiri.kr/
1•jaehakl•9m ago•0 comments

Wi-Fi 7 marketing is lying about its biggest feature [video]

https://www.youtube.com/watch?v=-5o_Qu3XToQ
2•wateralien•9m ago•0 comments

Thoughts on LLMs

https://finestructure.co/blog/2026/2/6/thoughts-on-llms
1•interpol_p•12m ago•0 comments

China's rare earth steel is transforming infrastructure [video]

https://www.youtube.com/watch?v=DfNN1Es02hI
1•zeristor•13m ago•0 comments

Show HN: CodeMic

https://codemic.io/#hn
1•seansh•13m ago•0 comments

How to build a hero section that gets you a chance

https://www.indiehackers.com/post/how-to-build-a-hero-section-that-actually-gets-you-a-chance-bff...
1•allinonetools_•14m ago•0 comments

Framework 13 Initial Impressions

https://www.abgn.me/posts/frame-work-13-initial-impressions
2•albingroen•14m ago•0 comments

Show HN: Peekr – An anonymous "Truth or Dare" game built with MERN

https://peekr-black.vercel.app/
1•peekrtrue•16m ago•1 comments

Casplist.eu

https://casplist.eu
1•PhilipV•23m ago•1 comments

OpenAI exec becomes top Trump donor with $25M gift

https://finance.yahoo.com/news/openai-exec-becomes-top-trump-230342268.html
7•doener•24m ago•0 comments

(AI) Slop Terrifies Me

https://ezhik.jp/ai-slop-terrifies-me/
2•Ezhik•24m ago•0 comments

Anthropic's team cut ad creation time from 30 minutes to 30 seconds

https://claude.com/blog/how-anthropic-uses-claude-marketing
2•Brajeshwar•32m ago•0 comments

Show HN: Elysia JIT "Compiler", why it's one of the fastest JavaScript framework

https://elysiajs.com/internal/jit-compiler
1•saltyaom•33m ago•0 comments

Cache Monet

https://cachemonet.com
1•keepamovin•34m ago•0 comments

Chinese Propaganda in Infomaniak's Euria, and a Reflection on Open Source AI

https://gagliardoni.net/#20260208_euria
1•tomgag•34m ago•1 comments

Show HN: A free, browser-only PDF tools collection built with Kimi k2.5

https://pdfuck.com
3•Justin3go•37m ago•0 comments

Curating a Show on My Ineffable Mother, Ursula K. Le Guin

https://hyperallergic.com/curating-a-show-on-my-ineffable-mother-ursula-k-le-guin/
2•bryanrasmussen•43m ago•0 comments

Show HN: HackerStack.dev – 49 Curated AI Tools for Indie Hackers

https://hackerstack.dev
1•pascalicchio•50m ago•0 comments

Pensions Are a Ponzi Scheme

https://poddley.com/?searchParams=segmentIds=b53ff41f-25c9-4f35-98d6-36616757d35b
2•onesandofgrain•56m ago•9 comments

Divvy.club – Splitwise alternative that makes sense

https://divvy.club
1•filepod•57m ago•0 comments

Betterment data breach exposes 1.4M customers

https://www.americanbanker.com/news/1-4-million-data-breach-betterment-shinyhunters-salesforce
2•NewCzech•57m ago•0 comments

MIT Technology Review has confirmed that posts on Moltbook were fake

https://www.technologyreview.com/2026/02/06/1132448/moltbook-was-peak-ai-theater/
3•helloplanets•57m ago•0 comments

Epstein Science: the people Epstein discussed scientific topics with

https://edge.dog/templates/cml9p8slu0009gdj2p0l8xf4r
2•castalian•57m ago•0 comments

Bambuddy – a free, self-hosted management system for Bambu Lab printers

https://bambuddy.cool
3•maziggy•1h ago•1 comments

Every Failed M4 Gun Replacement Attempt

https://www.youtube.com/watch?v=jrnAU67_EWg
3•tomaytotomato•1h ago•1 comments

China ramps up energy boom flagged by Musk as key to AI race

https://techxplore.com/news/2026-02-china-ramps-energy-boom-flagged.html
2•myk-e•1h ago•0 comments

Show HN: ClawBox – Dedicated OpenClaw Hardware (Jetson Orin Nano, 67 Tops, 20W)

https://openclawhardware.dev
2•superactro•1h ago•0 comments