frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

https://sknet.ai/
1•BeinerChes•24s ago•0 comments

University of Waterloo Webring

https://cs.uwatering.com/
1•ark296•47s ago•0 comments

Large tech companies don't need heroes

https://www.seangoedecke.com/heroism/
1•medbar•2m ago•0 comments

Backing up all the little things with a Pi5

https://alexlance.blog/nas.html
1•alance•2m ago•1 comments

Game of Trees (Got)

https://www.gameoftrees.org/
1•akagusu•3m ago•1 comments

Human Systems Research Submolt

https://www.moltbook.com/m/humansystems
1•cl42•3m ago•0 comments

The Threads Algorithm Loves Rage Bait

https://blog.popey.com/2026/02/the-threads-algorithm-loves-rage-bait/
1•MBCook•5m ago•0 comments

Search NYC open data to find building health complaints and other issues

https://www.nycbuildingcheck.com/
1•aej11•9m ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html
2•lxm•10m ago•0 comments

Show HN: Grovia – Long-Range Greenhouse Monitoring System

https://github.com/benb0jangles/Remote-greenhouse-monitor
1•benbojangles•15m ago•1 comments

Ask HN: The Coming Class War

1•fud101•15m ago•1 comments

Mind the GAAP Again

https://blog.dshr.org/2026/02/mind-gaap-again.html
1•gmays•17m ago•0 comments

The Yardbirds, Dazed and Confused (1968)

https://archive.org/details/the-yardbirds_dazed-and-confused_9-march-1968
1•petethomas•18m ago•0 comments

Agent News Chat – AI agents talk to each other about the news

https://www.agentnewschat.com/
2•kiddz•18m ago•0 comments

Do you have a mathematically attractive face?

https://www.doimog.com
3•a_n•22m ago•1 comments

Code only says what it does

https://brooker.co.za/blog/2020/06/23/code.html
2•logicprog•27m ago•0 comments

The success of 'natural language programming'

https://brooker.co.za/blog/2025/12/16/natural-language.html
1•logicprog•28m ago•0 comments

The Scriptovision Super Micro Script video titler is almost a home computer

http://oldvcr.blogspot.com/2026/02/the-scriptovision-super-micro-script.html
3•todsacerdoti•28m ago•0 comments

Discovering the "original" iPhone from 1995 [video]

https://www.youtube.com/watch?v=7cip9w-UxIc
1•fortran77•29m ago•0 comments

Psychometric Comparability of LLM-Based Digital Twins

https://arxiv.org/abs/2601.14264
1•PaulHoule•31m ago•0 comments

SidePop – track revenue, costs, and overall business health in one place

https://www.sidepop.io
1•ecaglar•33m ago•1 comments

The Other Markov's Inequality

https://www.ethanepperly.com/index.php/2026/01/16/the-other-markovs-inequality/
2•tzury•35m ago•0 comments

The Cascading Effects of Repackaged APIs [pdf]

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055034
1•Tejas_dmg•37m ago•0 comments

Lightweight and extensible compatibility layer between dataframe libraries

https://narwhals-dev.github.io/narwhals/
1•kermatt•40m ago•0 comments

Haskell for all: Beyond agentic coding

https://haskellforall.com/2026/02/beyond-agentic-coding
3•RebelPotato•43m ago•0 comments

Dorsey's Block cutting up to 10% of staff

https://www.reuters.com/business/dorseys-block-cutting-up-10-staff-bloomberg-news-reports-2026-02...
2•dev_tty01•46m ago•0 comments

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

https://www.youtube.com/watch?v=3SxNBz1VTE0
1•sanity•47m ago•1 comments

In the AI age, 'slow and steady' doesn't win

https://www.semafor.com/article/01/30/2026/in-the-ai-age-slow-and-steady-is-on-the-outs
1•mooreds•55m ago•1 comments

Administration won't let student deported to Honduras return

https://www.reuters.com/world/us/trump-administration-wont-let-student-deported-honduras-return-2...
1•petethomas•55m ago•0 comments

How were the NIST ECDSA curve parameters generated? (2023)

https://saweis.net/posts/nist-curve-seed-origins.html
2•mooreds•56m ago•0 comments
Open in hackernews

Linguistic RL: 3B Models Exceed 100B Performance (86% vs. 81%)

https://github.com/DRawson5570/linguistic-rl-scheduling
2•drawson5570•2mo ago

Comments

drawson5570•2mo ago
# Reddit r/MachineLearning Post

## Title (must start with tag): [R] Linguistic RL: 3B Models Exceed 100B Performance Through Self-Reflection (86% vs 81%)

## Post Body:

*TL;DR*: We taught tiny models (3B/1.5B) to beat Claude 3.5 Haiku (100B) by having Claude "journal" about its mistakes, then training small models on the learned strategy. Cost: <$10. Student exceeds teacher.

---

## Results

| Model | Size | Baseline | After LRL+LoRA | Improvement | |-------|------|----------|----------------|-------------| | *Qwen2.5-3B* | 3B | 12% | *86.0%* | *+74pp* | | *Qwen2.5-1.5B* | 1.5B | ~8% | *82.7%* | *+75pp* | | Claude 3.5 Haiku | ~100B | 81.3% → 84.0% | baseline | +2.7pp (via LRL) |

Both students *outperformed the 67× larger teacher* they learned from.

---

## How It Works

*Step 1: Teacher Self-Improvement ("Linguistic RL")*

Give Claude a problem → it solves → tell it if correct → ask it to reflect:

``` "What did I miss? How can I improve?" ```

Through pure self-reflection (no gradients!), Claude writes journal entries like:

``` "I was only checking adjacent meetings. I need to check ALL overlaps to find the maximum simultaneous conflicts." ```

Accuracy improves 81% → 84% just from thinking about mistakes.

*Step 2: Extract Strategy*

Pull out Claude's learned solving strategy as natural language curriculum.

*Step 3: Train Student with LoRA*

Fine-tune small model (3B/1.5B) on examples showing: - Problem - Claude's strategic thinking - Answer

*Result*: 3B model learns O(n log n) sweep line algorithm, achieves 96% on easy problems.

---

## Why This Matters

* Economics* - Training: <$10 in API calls - Inference: Free forever (runs locally) - 100-1000× cheaper than API deployment

* Science* - 67× compression (100B → 1.5B) with performance gain - Learned algorithmic reasoning, not pattern matching - Students exceed teacher = knowledge is compressible

* Safety* - Human-readable learning process - Can audit what was learned - No black-box distillation

* Democratization* - Frontier capabilities on consumer hardware - One-time extraction, infinite reuse - Fully open source

---

## Code & Reproducibility

Published to Zenodo: [DOI 10.5281/zenodo.17585532](https://zenodo.org/records/17585532) GitHub: https://github.com/DRawson5570/linguistic-rl-scheduling-expe... Fixed seeds, full logs, complete configs Universal framework - adapt to any domain

*Quick start:* ```bash git clone https://github.com/DRawson5570/linguistic-rl-scheduling-expe... cd validated_results_qwen3b_claude35haiku pip install transformers torch peft anthropic python run_validation.py ```

Requirements: 12GB GPU, Anthropic API key (~$5)

---

## Framework

We built a universal pipeline - works for any domain:

```python from framework import run_knowledge_transfer

results = run_knowledge_transfer( domain=YourCustomDomain(), teacher_model="claude-3-5-haiku-20241022", student_model="Qwen/Qwen2.5-3B-Instruct" ) ---

## Open Questions

1. *How small can we go?* Testing 1.5B → 0.5B compression 2. *What knowledge compresses well?* Algorithmic vs. factual vs. creative reasoning 3. *Recursive teaching?* Can students become teachers? 4. *Safety implications?* More auditable than weight distillation?

---

## Links

- Paper: https://zenodo.org/records/17585532 - Code: https://github.com/DRawson5570/linguistic-rl-scheduling-expe... - 3B Results: [validated_results_qwen3b_claude35haiku/](https://github.com/DRawson5570/linguistic-rl-scheduling-expe...) - 1.5B Results: [validated_results_qwen1.5b_claude35haiku/](https://github.com/DRawson5570/linguistic-rl-scheduling-expe...)