Just me feeling that Mythos/Fabel just 1% there?

1•punnerud•1h ago

I keep pushing the frontier models to the limits and have several projects they still can’t solve, I benchmark new models on. Every new model make it easier to “solve” the even harder problems, but still I have this feeling that they rely 99% on my ideas. They just don’t get the ideas and I have to hold their hand and help them.

Don’t get me wrong, anything that is close to done already they excel at and can combine existing techniques. I’m talking about new ideas models have never seen before.

Example I have this hobby project that push what’s possible with route optimization. Yes it’s close to SOTA and way more efficient than all (?) other solutions out there (punnerud.github.io/mpee/), but I have to hold the model in the hand and brainstorm ideas on how to compress a matrix.

And it’s just not a one time thing, happens like 40-50 times in few days.

The 1% there is this “new ideas” part. Why can I come up with all these, and not the model? A really hard reval to create. Now this project is open, later I am thinking about making a frontier project in the same way, keeping it away from the public and using it as a benchmark. It’s that the best way to test for new ideas in models?

Comments

discordance•40m ago

"If we ever quit or retire we have to give back our augmented brains and cyborg bodies, and there wouldn’t be much left after that; there are countless ingredients that make up the human body and mind, like all the components that make up me as an individual with my own personality: sure, I have a face and voice to distinguish myself from others, but my thoughts and memories are unique only to me, and I carry a sense of my own destiny; each of those things is just a small part of it, and I collect information to use in my own way, and all of that blends to create a mixture that forms me and gives rise to my consciousness."

— Major Motoko Kusanagi, Section 9 field commander

punnerud•9m ago

Not just experience or prior knowledge; more like a way to generalise knowledge and connect the dots with little training data.

A civilization inside a GitHub repo. Issues are laws, ticks every 4h

GitLab Is Down

GitLab Down?

Event-Driven TypeScript: An Interview on Nimbus

Why AI conversation mode beats vocabulary lists

Selectively setting some iPhone apps to 'color', greyscale for all others

C3 0.8.1 released: Raiding the stdlib for bugs

EVs hit 24% of new cars registered in Ireland in 2026

OpenAI says Chinese accounts tried to turn Americans against data centres

Cheap Iranian drone downed $25M US Army helicopter–maybe by chance

The Conductor Rewrite: What They Changed to Make It Fast

Reverse Engineering Linear's Sync Engine: A Detailed Study

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

Show HN: A text based browser, written in Rust, for humans and agents

Europe asked for fair markets. Nobody asked to be left out

A Parents Guide to AI

Show HN: Remove Anything – AI Background Removal Tool

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

Frontier: A Discrete-Event Simulator for Modern LLM Serving

Pokémon Go Scans Trained the Navigation Tech for Military Drones

Revenge of the nerds: How data scientists catch fraudsters

Your Package Manager Is Lying to You

Show HN: Aegis – post-quantum cyberdefense proxy (687 attacks, 0 breaches, 40d)

Phantomix – Open-source browser AI agent, free alternative to OpenAI Operator

Macaroni – a single HTML file messenger

I got inside a North Korean hiring scam

Ask HN: Want to build something open source on nights and weekends together?

Show HN: NightCity Tracer is an open-source Blue Team Simulator

Gordon Wood's Proust

Cybercriminals claim breach of Oracle PeopleSoft servers at 100 organizations