Ask HN: Just me feeling that Mythos/Fabel just 1% there?

4•punnerud•14h ago

I keep pushing the frontier models to the limits and have several projects they still can’t solve, I benchmark new models on. Every new model make it easier to “solve” the even harder problems, but still I have this feeling that they rely 99% on my ideas. They just don’t get the ideas and I have to hold their hand and help them.

Don’t get me wrong, anything that is close to done already they excel at and can combine existing techniques. I’m talking about new ideas models have never seen before.

Example I have this hobby project that push what’s possible with route optimization. Yes it’s close to SOTA and way more efficient than all (?) other solutions out there (punnerud.github.io/mpee/), but I have to hold the model in the hand and brainstorm ideas on how to compress a matrix.

And it’s just not a one time thing, happens like 40-50 times in few days.

The 1% there is this “new ideas” part. Why can I come up with all these, and not the model? A really hard reval to create. Now this project is open, later I am thinking about making a frontier project in the same way, keeping it away from the public and using it as a benchmark. It’s that the best way to test for new ideas in models?

Comments

discordance•13h ago

"If we ever quit or retire we have to give back our augmented brains and cyborg bodies, and there wouldn’t be much left after that; there are countless ingredients that make up the human body and mind, like all the components that make up me as an individual with my own personality: sure, I have a face and voice to distinguish myself from others, but my thoughts and memories are unique only to me, and I carry a sense of my own destiny; each of those things is just a small part of it, and I collect information to use in my own way, and all of that blends to create a mixture that forms me and gives rise to my consciousness."

— Major Motoko Kusanagi, Section 9 field commander

punnerud•13h ago

Not just experience or prior knowledge; more like a way to generalise knowledge and connect the dots with little training data.

leonidasrup•12h ago

The quote is from Ghost in the Shell (1995 film)

https://en.wikipedia.org/wiki/Ghost_in_the_Shell_(1995_film)

Lockal•10h ago

As we know, Pi constant contains all human knowledge encoded in some part of it, and modern computers calculated 1% of Pi!

(there are 2 mistakes in the sentence above)

Ask HN: Favorite text heavy blogs that are a joy to read?

Ask HN: Want to build something open source on nights and weekends together?

Ask HN: How do you get into a flow state when using AI to code?

Ask HN: How are thinking efforts implemented?

Ask HN: Would it be useful to have a slop button in addition to flag?

I procrastinate by building tools to stop me from procrastinating: A sad story

I'm Eric Ries, author of "The Lean Startup" and new book "Incorruptible" – AMA

Ask HN: Agents get dumber before release of new model version?

Notes on DeepSeek

AWS Bedrock to require sharing data with Anthropic for Mythos and future models

Ask HN: Is there a metric for AI code quality?

Ask HN: Are most corporate SWE jobs performative?

Ask HN: Is anyone shorting the overspend in AI yet?

Ask HN: Why hasn't there been a real competitor to Ticketmaster yet?

Ask HN: What internal tool did you build that became a product?

Ask HN: Is anyone else seeing a Slack auth bug?

Tell HN: Claude Code keeps getting worse

Ask HN: Are you still using a Vision Pro?

Ask HN: What are tools you have made for yourself since the advent of AI?

Ask HN: Just me feeling that Mythos/Fabel just 1% there?

Tell HN: np.reddit.com now redirects to www.reddit.com

Ask HN: Degree apprenticeships in engineering, do they exist?

I added a prompt to future ASI – TLBIC Policy Proposal v5 now available

Ask HN: What coding agents are you using?

Ask HN: Temporal Awareness in LLM?

Discussion: Fable 5 is weak at flagging prompts correctly

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

Ask HN: Did Anthropic Just Win?

Ask HN: Releasing code under AGPLv3, but want to block LLM reconstruction?

Ask HN: What software feels exceptionally polished?

Ask HN: Just me feeling that Mythos/Fabel just 1% there?

Comments

Ask HN: Favorite text heavy blogs that are a joy to read?

Ask HN: Want to build something open source on nights and weekends together?

Ask HN: How do you get into a flow state when using AI to code?

Ask HN: How are thinking efforts implemented?

Ask HN: Would it be useful to have a slop button in addition to flag?

I procrastinate by building tools to stop me from procrastinating: A sad story

I'm Eric Ries, author of "The Lean Startup" and new book "Incorruptible" – AMA

Ask HN: Agents get dumber before release of new model version?

Notes on DeepSeek

AWS Bedrock to require sharing data with Anthropic for Mythos and future models

Ask HN: Is there a metric for AI code quality?

Ask HN: Are most corporate SWE jobs performative?

Ask HN: Is anyone shorting the overspend in AI yet?

Ask HN: Why hasn't there been a real competitor to Ticketmaster yet?

Ask HN: What internal tool did you build that became a product?

Ask HN: Is anyone else seeing a Slack auth bug?

Tell HN: Claude Code keeps getting worse

Ask HN: Are you still using a Vision Pro?

Ask HN: What are tools you have made for yourself since the advent of AI?

Ask HN: Just me feeling that Mythos/Fabel just 1% there?

Tell HN: np.reddit.com now redirects to www.reddit.com

Ask HN: Degree apprenticeships in engineering, do they exist?

I added a prompt to future ASI – TLBIC Policy Proposal v5 now available

Ask HN: What coding agents are you using?

Ask HN: Temporal Awareness in LLM?

Discussion: Fable 5 is weak at flagging prompts correctly

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

Ask HN: Did Anthropic Just Win?

Ask HN: Releasing code under AGPLv3, but want to block LLM reconstruction?

Ask HN: What software feels exceptionally polished?