frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Linguistic RL: 3B Models Exceed 100B Performance (86% vs. 81%)

https://github.com/DRawson5570/linguistic-rl-scheduling
2•drawson5570•1h ago

Comments

drawson5570•1h ago
# Reddit r/MachineLearning Post

## Title (must start with tag): [R] Linguistic RL: 3B Models Exceed 100B Performance Through Self-Reflection (86% vs 81%)

## Post Body:

*TL;DR*: We taught tiny models (3B/1.5B) to beat Claude 3.5 Haiku (100B) by having Claude "journal" about its mistakes, then training small models on the learned strategy. Cost: <$10. Student exceeds teacher.

---

## Results

| Model | Size | Baseline | After LRL+LoRA | Improvement | |-------|------|----------|----------------|-------------| | *Qwen2.5-3B* | 3B | 12% | *86.0%* | *+74pp* | | *Qwen2.5-1.5B* | 1.5B | ~8% | *82.7%* | *+75pp* | | Claude 3.5 Haiku | ~100B | 81.3% → 84.0% | baseline | +2.7pp (via LRL) |

Both students *outperformed the 67× larger teacher* they learned from.

---

## How It Works

*Step 1: Teacher Self-Improvement ("Linguistic RL")*

Give Claude a problem → it solves → tell it if correct → ask it to reflect:

``` "What did I miss? How can I improve?" ```

Through pure self-reflection (no gradients!), Claude writes journal entries like:

``` "I was only checking adjacent meetings. I need to check ALL overlaps to find the maximum simultaneous conflicts." ```

Accuracy improves 81% → 84% just from thinking about mistakes.

*Step 2: Extract Strategy*

Pull out Claude's learned solving strategy as natural language curriculum.

*Step 3: Train Student with LoRA*

Fine-tune small model (3B/1.5B) on examples showing: - Problem - Claude's strategic thinking - Answer

*Result*: 3B model learns O(n log n) sweep line algorithm, achieves 96% on easy problems.

---

## Why This Matters

* Economics* - Training: <$10 in API calls - Inference: Free forever (runs locally) - 100-1000× cheaper than API deployment

* Science* - 67× compression (100B → 1.5B) with performance gain - Learned algorithmic reasoning, not pattern matching - Students exceed teacher = knowledge is compressible

* Safety* - Human-readable learning process - Can audit what was learned - No black-box distillation

* Democratization* - Frontier capabilities on consumer hardware - One-time extraction, infinite reuse - Fully open source

---

## Code & Reproducibility

Published to Zenodo: [DOI 10.5281/zenodo.17585532](https://zenodo.org/records/17585532) GitHub: https://github.com/DRawson5570/linguistic-rl-scheduling-expe... Fixed seeds, full logs, complete configs Universal framework - adapt to any domain

*Quick start:* ```bash git clone https://github.com/DRawson5570/linguistic-rl-scheduling-expe... cd validated_results_qwen3b_claude35haiku pip install transformers torch peft anthropic python run_validation.py ```

Requirements: 12GB GPU, Anthropic API key (~$5)

---

## Framework

We built a universal pipeline - works for any domain:

```python from framework import run_knowledge_transfer

results = run_knowledge_transfer( domain=YourCustomDomain(), teacher_model="claude-3-5-haiku-20241022", student_model="Qwen/Qwen2.5-3B-Instruct" ) ---

## Open Questions

1. *How small can we go?* Testing 1.5B → 0.5B compression 2. *What knowledge compresses well?* Algorithmic vs. factual vs. creative reasoning 3. *Recursive teaching?* Can students become teachers? 4. *Safety implications?* More auditable than weight distillation?

---

## Links

- Paper: https://zenodo.org/records/17585532 - Code: https://github.com/DRawson5570/linguistic-rl-scheduling-expe... - 3B Results: [validated_results_qwen3b_claude35haiku/](https://github.com/DRawson5570/linguistic-rl-scheduling-expe...) - 1.5B Results: [validated_results_qwen1.5b_claude35haiku/](https://github.com/DRawson5570/linguistic-rl-scheduling-expe...)

Fungus in Chernobyl nuclear disaster zone has mutated to 'feed' on radiation

https://www.unilad.com/news/world-news/fungus-chernobyl-mutated-feed-radiation-164735-20241217
1•thunderbong•1m ago•0 comments

Tesla gets 14 times more labor productivity per dollar in China than the U.S.

https://twitter.com/RnaudBertrand/status/1988607558261608762
1•delichon•4m ago•0 comments

Optimal "Where" on Tenstorrent

https://www.jasondavies.com/2025/tenstorrent-where/
1•jasondavies•4m ago•0 comments

Amazon's Antitrust Paradox

https://yalelawjournal.org/pdf/e.710.Khan.805_zuvfyyeh.pdf?
2•lt_snuffles•6m ago•0 comments

The Platform Google Claims Is Behind a 'Staggering' Scam Text Operation

https://www.wired.com/story/lighthouse-google-lawsuit-scam-text-messages/
2•manveerc•7m ago•0 comments

Tech companies start to comply with Australia's teen social media ban

https://www.reuters.com/world/asia-pacific/big-tech-stops-complaining-starts-complying-with-austr...
2•m-hodges•8m ago•0 comments

AI-designed viruses raise fears over creating life

https://www.washingtonpost.com/science/2025/11/11/ai-designed-viruses-bacteria-life/
1•ojosilva•8m ago•0 comments

The Complicated Reality of 3D Printed Prosthetics

https://spectrum.ieee.org/how-3d-printing-helping-prosthetics
1•quapster•9m ago•0 comments

The developing world needs more roads

https://worksinprogress.co/issue/the-developing-world-needs-more-roads/
1•bensouthwood•9m ago•0 comments

Beyond the Spectacle

https://rodgercuddington.substack.com/p/beyond-the-spectacle
1•freespirt•10m ago•1 comments

IndQA

https://openai.com/index/introducing-indqa/
2•manveerc•11m ago•0 comments

Show HN: Built an AI assistant in MonkeyC for Garmin watches

https://untether.watch
1•msyea•12m ago•0 comments

Bitcoin at Coinbase: A Report on Innovation and Growth

https://www.coinbase.com/blog/Bitcoin-at-Coinbase-A-Report-on-Innovation-and-Growth
1•dukebartnik•12m ago•0 comments

"Belief in the law of small numbers" the continuing appeal of junk science

https://statmodeling.stat.columbia.edu/2025/11/12/belief-in-the-law-of-small-numbers-as-a-way-to-...
2•nabla9•12m ago•0 comments

Python for AI: Is it better, or was it just first?

1•mrbbk•13m ago•0 comments

Show HN: GoViralPromo – Replace ads with performance-based contests (free beta)

https://www.goviralpromo.com
1•Matthew25•13m ago•1 comments

Space forecasters say solar storms could hit Earth and trigger auroras

https://www.npr.org/2025/11/12/g-s1-97533/solar-storms-auroras
1•manveerc•14m ago•0 comments

What Past Computing Breakthroughs Teach Us About AI – Communications of the ACM

https://cacm.acm.org/blogcacm/what-past-computing-breakthroughs-teach-us-about-ai/
2•rbanffy•15m ago•0 comments

Learn Prolog Now

https://lpn.swi-prolog.org/lpnpage.php?pageid=top
2•rramadass•15m ago•0 comments

Show HN: Domain Is Yours – For a Day

https://popup.so
1•matthiasstiller•15m ago•0 comments

Denial of Fuzzing: Rust in the Windows Kernel

https://research.checkpoint.com/2025/denial-of-fuzzing-rust-in-the-windows-kernel/
2•ndiddy•18m ago•0 comments

When Your Husband Spends $300k on DraftKings

https://www.thecut.com/article/draftkings-sports-betting-gambling-addiction-relationships.html
1•randycupertino•20m ago•1 comments

AI Progress and Recommendations

https://openai.com/index/ai-progress-and-recommendations/
2•gmays•20m ago•0 comments

The AI Mega Mesh: How to Connect 30 GPU Cloud Providers

https://netbird.io/knowledge-hub/multi-cloud-ai-mega-mesh
9•devildriver89•23m ago•0 comments

Fei-Fei Li Says Spatial Intelligence Is AI's Next Frontier

https://www.theneuron.ai/explainer-articles/why-godmother-of-ai-dr-fei-fei-li-says-spatial-intell...
2•gmays•25m ago•1 comments

Proving two ML models are equivalent using Z3 (with code)

https://www.testingbranch.com/Z3-and-model-equivalence/
3•mpcsb•26m ago•1 comments

Show HN: Visual Types – A humble set of animated TypeScript concepts

https://types.kitlangton.com
1•sparklyoldman•29m ago•0 comments

Virgin Media O2 seals deal with Elon Musk firm to boost UK rural mobile coverage

https://www.theguardian.com/business/2025/oct/30/virgin-media-o2-seals-deal-with-elon-musk-firm-t...
1•PaulHoule•30m ago•0 comments

A new stream abstraction for rust

https://docs.rs/ufotofu/latest/ufotofu/
2•rklaehn•30m ago•0 comments

The Message in the Medium

https://asteriskmag.com/issues/12-books/the-message-in-the-medium
1•surprisetalk•31m ago•0 comments