frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Linguistic RL: 3B Models Exceed 100B Performance (86% vs. 81%)

https://github.com/DRawson5570/linguistic-rl-scheduling
2•drawson5570•2mo ago

Comments

drawson5570•2mo ago
# Reddit r/MachineLearning Post

## Title (must start with tag): [R] Linguistic RL: 3B Models Exceed 100B Performance Through Self-Reflection (86% vs 81%)

## Post Body:

*TL;DR*: We taught tiny models (3B/1.5B) to beat Claude 3.5 Haiku (100B) by having Claude "journal" about its mistakes, then training small models on the learned strategy. Cost: <$10. Student exceeds teacher.

---

## Results

| Model | Size | Baseline | After LRL+LoRA | Improvement | |-------|------|----------|----------------|-------------| | *Qwen2.5-3B* | 3B | 12% | *86.0%* | *+74pp* | | *Qwen2.5-1.5B* | 1.5B | ~8% | *82.7%* | *+75pp* | | Claude 3.5 Haiku | ~100B | 81.3% → 84.0% | baseline | +2.7pp (via LRL) |

Both students *outperformed the 67× larger teacher* they learned from.

---

## How It Works

*Step 1: Teacher Self-Improvement ("Linguistic RL")*

Give Claude a problem → it solves → tell it if correct → ask it to reflect:

``` "What did I miss? How can I improve?" ```

Through pure self-reflection (no gradients!), Claude writes journal entries like:

``` "I was only checking adjacent meetings. I need to check ALL overlaps to find the maximum simultaneous conflicts." ```

Accuracy improves 81% → 84% just from thinking about mistakes.

*Step 2: Extract Strategy*

Pull out Claude's learned solving strategy as natural language curriculum.

*Step 3: Train Student with LoRA*

Fine-tune small model (3B/1.5B) on examples showing: - Problem - Claude's strategic thinking - Answer

*Result*: 3B model learns O(n log n) sweep line algorithm, achieves 96% on easy problems.

---

## Why This Matters

* Economics* - Training: <$10 in API calls - Inference: Free forever (runs locally) - 100-1000× cheaper than API deployment

* Science* - 67× compression (100B → 1.5B) with performance gain - Learned algorithmic reasoning, not pattern matching - Students exceed teacher = knowledge is compressible

* Safety* - Human-readable learning process - Can audit what was learned - No black-box distillation

* Democratization* - Frontier capabilities on consumer hardware - One-time extraction, infinite reuse - Fully open source

---

## Code & Reproducibility

Published to Zenodo: [DOI 10.5281/zenodo.17585532](https://zenodo.org/records/17585532) GitHub: https://github.com/DRawson5570/linguistic-rl-scheduling-expe... Fixed seeds, full logs, complete configs Universal framework - adapt to any domain

*Quick start:* ```bash git clone https://github.com/DRawson5570/linguistic-rl-scheduling-expe... cd validated_results_qwen3b_claude35haiku pip install transformers torch peft anthropic python run_validation.py ```

Requirements: 12GB GPU, Anthropic API key (~$5)

---

## Framework

We built a universal pipeline - works for any domain:

```python from framework import run_knowledge_transfer

results = run_knowledge_transfer( domain=YourCustomDomain(), teacher_model="claude-3-5-haiku-20241022", student_model="Qwen/Qwen2.5-3B-Instruct" ) ---

## Open Questions

1. *How small can we go?* Testing 1.5B → 0.5B compression 2. *What knowledge compresses well?* Algorithmic vs. factual vs. creative reasoning 3. *Recursive teaching?* Can students become teachers? 4. *Safety implications?* More auditable than weight distillation?

---

## Links

- Paper: https://zenodo.org/records/17585532 - Code: https://github.com/DRawson5570/linguistic-rl-scheduling-expe... - 3B Results: [validated_results_qwen3b_claude35haiku/](https://github.com/DRawson5570/linguistic-rl-scheduling-expe...) - 1.5B Results: [validated_results_qwen1.5b_claude35haiku/](https://github.com/DRawson5570/linguistic-rl-scheduling-expe...)

We Scanned an AI Assistant for Security Issues: 12,465 Vulnerabilities

https://codeslick.dev/blog/openclaw-security-audit
1•vitorlourenco•42s ago•0 comments

Amazon no longer defend cloud customers against video patent infringement claims

https://ipfray.com/amazon-no-longer-defends-cloud-customers-against-video-patent-infringement-cla...
1•ffworld•1m ago•0 comments

Show HN: Medinilla – an OCPP compliant .NET back end (partially done)

https://github.com/eliodecolli/Medinilla
2•rhcm•4m ago•0 comments

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6157066
1•dkga•4m ago•1 comments

Resistance Infrastructure

https://www.profgalloway.com/resistance-infrastructure/
2•samizdis•8m ago•0 comments

Fire-juggling unicyclist caught performing on crossing

https://news.sky.com/story/fire-juggling-unicyclist-caught-performing-on-crossing-13504459
1•austinallegro•9m ago•0 comments

Restoring a lost 1981 Unix roguelike (protoHack) and preserving Hack 1.0.3

https://github.com/Critlist/protoHack
2•Critlist•11m ago•0 comments

GPS and Time Dilation – Special and General Relativity

https://philosophersview.com/gps-and-time-dilation/
1•mistyvales•14m ago•0 comments

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

https://github.com/writerslogic/witnessd
1•davidcondrey•14m ago•1 comments

Show HN: I built a clawdbot that texts like your crush

https://14.israelfirew.co
2•IsruAlpha•16m ago•2 comments

Scientists reverse Alzheimer's in mice and restore memory (2025)

https://www.sciencedaily.com/releases/2025/12/251224032354.htm
1•walterbell•19m ago•0 comments

Compiling Prolog to Forth [pdf]

https://vfxforth.com/flag/jfar/vol4/no4/article4.pdf
1•todsacerdoti•20m ago•0 comments

Show HN: Cymatica – an experimental, meditative audiovisual app

https://apps.apple.com/us/app/cymatica-sounds-visualizer/id6748863721
1•_august•22m ago•0 comments

GitBlack: Tracing America's Foundation

https://gitblack.vercel.app/
2•martialg•22m ago•0 comments

Horizon-LM: A RAM-Centric Architecture for LLM Training

https://arxiv.org/abs/2602.04816
1•chrsw•22m ago•0 comments

We just ordered shawarma and fries from Cursor [video]

https://www.youtube.com/shorts/WALQOiugbWc
1•jeffreyjin•23m ago•1 comments

Correctio

https://rhetoric.byu.edu/Figures/C/correctio.htm
1•grantpitt•23m ago•0 comments

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist
1•crescit_eundo•27m ago•0 comments

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

https://www.twz.com/air/watch-ukraines-minigun-firing-drone-hunting-turboprop-in-action
1•breve•28m ago•0 comments

Free Trial: AI Interviewer

https://ai-interviewer.nuvoice.ai/
1•sijain2•28m ago•0 comments

FDA intends to take action against non-FDA-approved GLP-1 drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
21•randycupertino•30m ago•12 comments

Supernote e-ink devices for writing like paper

https://supernote.eu/choose-your-product/
3•janandonly•32m ago•0 comments

We are QA Engineers now

https://serce.me/posts/2026-02-05-we-are-qa-engineers-now
1•SerCe•32m ago•0 comments

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

https://arxiv.org/abs/2602.01465
2•NBenkovich•32m ago•0 comments

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

https://www.latent.space/p/adversarial-reasoning
1•swyx•33m ago•0 comments

Show HN: Poddley.com – Follow people, not podcasts

https://poddley.com/guests/ana-kasparian/episodes
1•onesandofgrain•41m ago•0 comments

Layoffs Surge 118% in January – The Highest Since 2009

https://www.cnbc.com/2026/02/05/layoff-and-hiring-announcements-hit-their-worst-january-levels-si...
13•karakoram•41m ago•0 comments

Papyrus 114: Homer's Iliad

https://p114.homemade.systems/
1•mwenge•41m ago•1 comments

DicePit – Real-time multiplayer Knucklebones in the browser

https://dicepit.pages.dev/
1•r1z4•41m ago•1 comments

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
2•PaulHoule•43m ago•0 comments