frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

GRPO experiment - I trained a Language Model to schedule events

https://github.com/anakin87/qwen-scheduler-grpo
1•anakin87•4h ago

Comments

anakin87•4h ago
I experimented with GRPO lately, since I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.

After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game, but I wanted a different challenge.

So I opted for teaching a model to create a schedule from a list of events and priorities.

Choosing an original problem forced me to think about the problem setting, generate data, choose the base model, design reward functions, and run multiple rounds of training, hoping that my model would learn something.

A fun and rewarding experience :-)

I learned a lot of things, that I want to share with you.

---

- Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo

- Code: https://github.com/anakin87/qwen-scheduler-grpo

- Hugging Face collection (dataset and model): https://huggingface.co/collections/anakin87/qwen-scheduler-g...

---

Some hot takes from my experiment

- GRPO is cool for verifiable tasks, but is more about eliciting desired behaviors from the trained model than teaching completely new stuff to it. https://arxiv.org/abs/2504.13837

- Choosing the right base model (and size) matters.

- "Aha moment" might be over-hyped. https://oatllm.notion.site/oat-zero

- Reward functions design is crucial. If your rewards are not robust, you might experience reward hacking (as it happened to me).

- Unsloth is great for saving GPU, but beware of bugs.

Parse_searchable_rolls – Parse Searchable Electoral Rolls

https://github.com/in-rolls/parse_searchable_rolls
1•goji_berries•3s ago•0 comments

Modular's bet to break out of the Matrix

https://www.modular.com/blog/modulars-bet-to-break-out-of-the-matrix-democratizing-ai-compute-part-10
1•melodyogonna•3m ago•0 comments

US vs. Google Amicus Curiae Brief of Y Combinator in Support of Plaintiffs [pdf]

https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1300.1.pdf
1•dave1629•4m ago•0 comments

Pope Leo XIV's PhD Dissertation

https://kathleenmccook.substack.com/p/pope-leo-xiv-dissertation
1•lordleft•4m ago•0 comments

Bcachefs, Btrfs, EXT4, F2FS and XFS File-System Performance on Linux 6.15

https://www.phoronix.com/review/linux-615-filesystems
2•throwaway1482•6m ago•0 comments

Comprimir GIF – Compressor de GIF Online Gratuito

https://comprimirgif.com/
1•MxcAlex•7m ago•0 comments

War – An Unhinged Tale of Love, Revenge, and Death

https://www.writervivek.com/2025/05/war-unhinged-tale-of-love-revenge-and.html
4•totaldude87•10m ago•0 comments

Cognitive deficits in people who have recovered from Covid-19 (2021)

https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370%2821%2900324-2/fulltext
2•inverted_flag•11m ago•0 comments

Petrichor

https://en.wikipedia.org/wiki/Petrichor
1•bpierre•11m ago•0 comments

The Battle to Bottle Palm Wine

https://www.atlasobscura.com/articles/palm-wine-in-united-states
1•prmph•15m ago•0 comments

Interferometer Device Sees Text from a Mile Away

https://physics.aps.org/articles/v18/99
2•bookofjoe•15m ago•0 comments

mRNA vaccine makers are scrambling to navigate an 'existential threat'

https://www.nature.com/articles/d41586-025-01462-9
2•rntn•16m ago•0 comments

Pulse Rate Predicts Faster Cognitive Decline in Older Adults

https://www.massgeneralbrigham.org/en/about/newsroom/press-releases/pulse-rate-measure-predicts-cognitive-decline
1•geox•18m ago•0 comments

Show HN: Mycelium

https://github.com/mycweb/mycelium
1•brendoncarroll•23m ago•0 comments

Why GADTs matter for performance (2015)

https://blog.janestreet.com/why-gadts-matter-for-performance/
1•hyperbrainer•24m ago•0 comments

Tech oligarchs are gambling our future on a fantasy

https://www.theguardian.com/commentisfree/2025/may/03/tech-oligarchs-musk
2•NotInOurNames•25m ago•0 comments

High-Latitude Stratospheric Aerosol Injection Is Feasible with Existing Aircraft

https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2024EF005567
2•PaulHoule•26m ago•0 comments

Remembering to Prompt Yourself – A Reflection on Inner Questions in the Age

https://blog.namar0x0309.com/2025/05/remembering-to-prompt-yourself-a-reflection-on-inner-questions-in-the-age-of-ai/
1•rationalfaith•27m ago•0 comments

Failed Soviet Venus lander Kosmos 482 crashes to Earth after 53 years in orbit

https://www.space.com/space-exploration/launches-spacecraft/failed-soviet-venus-lander-kosmos-482-crashes-to-earth-after-53-years-in-orbit
3•taubek•28m ago•0 comments

How Many People Have Died in Gaza?

https://www.economist.com/interactive/middle-east-and-africa/2025/05/08/how-many-people-have-died-in-gaza
2•mandmandam•28m ago•1 comments

Gimp to Discuss Changing Name

https://floss.social/@GIMP/114481178111577485
4•todsacerdoti•30m ago•1 comments

Coffee for people who don't like coffee

https://ostwilkens.se/blog/coffee
1•ostwilkens•34m ago•0 comments

Huawei introduces its first laptop running HarmonyOS rather than Windows

https://liliputing.com/huawei-introduces-its-first-laptop-running-harmonyos-rather-than-windows/
1•xrayarx•34m ago•0 comments

Klarna plans to hire humans again

https://fortune.com/2025/05/09/klarna-ai-humans-return-on-investment/
3•mooreds•39m ago•0 comments

Can Whitney Wolfe Herd Make Us Love Dating Apps Again?

https://www.nytimes.com/2025/05/10/magazine/whitney-wolfe-herd-interview.html
1•mooreds•39m ago•0 comments

Interagency Grizzly Bear Committee

https://igbconline.org/
1•mooreds•40m ago•0 comments

Haxe 4.3.7

https://community.haxe.org/t/haxe-4-3-7-released/4611
3•phplovesong•42m ago•1 comments

Membrane, Media Framework for Elixir

https://membrane.stream/
1•lawik•47m ago•0 comments

Ash Framework – Model your domain, derive the rest

https://ash-hq.org/
2•lawik•48m ago•0 comments

People Who Hype Cursor Usually Lack Technical Skills

https://en.smallyu.net/2025/04/12/People%20Who%20Hype%20Cursor%20Usually%20Lack%20Technical%20Skills/
2•cratermoon•51m ago•0 comments