Questioning Representational Optimism in Deep Learning

1•publicdaniel•1d ago

Comments

publicdaniel•1d ago

From the author's tweet (https://x.com/kenneth0stanley/status/1924650124829196370)

Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. The idea stems from a little-known observation about networks trained to output a single image: when they are discovered through an unconventional open-ended search process, their representations are incredibly elegant and exhibit astonishing modular decomposition. In contrast, when SGD (successfully) learns to output the same image its underlying representation is fractured, entangled - an absolute mess!

This stark difference in the underlying representation of the same "good" output behavior carries deep lessons for deep learning. It shows you cannot judge a book by its cover - an LLM with all the right responses could similarly be a mess under the hood. But also, surprisingly, it shows us that it doesn't have to be this way! Without the unique examples in this paper that were discovered through open-ended search, we might assume neural representation has to be a mess. These results show that is clearly untrue. We can now imagine something better because we can actually see it is possible.

We give several reasons why this matters: generalization, creativity, and learning are all potentially impacted. The paper shows examples to back up these concerns, but in brief, there is a key insight: Representation is not only important for what you're able to do now, but for where you can go from there. The ability to imagine something new (and where your next step in weight space can bring you) depends entirely upon how you represent the world. Generalization, creativity, and learning itself depend upon this critical relationship. Notice the difference in appearance between the nearby images to the skull in weight space shown in the top-left and top-right image strips of the attached graphic. The difference in semantics is stark.

The insight that representation could be better opens up a lot of new paths and opportunities for investigation. It raises new urgency to understand the representation underlying foundation models and LLMs while exposing all kinds of novel avenues for potentially improving them, from making learning processes more open-ended to manipulating architectures and algorithms.

Don't mistake this paper as providing comfort for AI pessimists. By exposing a novel set of stark and explicit differences between conventional learning and something different, it can act as an accelerator of progress as opposed to a tool of pessimism. At the least, the discussion it provokes should be quite illuminating.

Fredkin•1d ago

What does it mean to train using an 'open ended' process? Is it like using a genetic algorithm to explore / generate _any_ image resembling something from the training set, instead of adjusting weights according to gradients on a case-by-case or batch-by-batch basis?

publicdaniel•1d ago

Here's my really amateur understanding of this:

- Conventional SGD: Fixed target (e.g. "make an exact replica of this butterfly image") and it follows greedy path to minimize the error

- Open Ended Search Process: No predetermined goal, explores based on what's "interesting" or novel. In Picbreeder, humans would see several generated images, pick the "interesting" ones, and the system would mutate/evolve from there. If you were evolving an image that looked like an egg and it mutated toward a teapot like shape, you could pivot and pursue that direction instead.

This is kinda the catch -- there is a human element here where individuals are choosing what's "interesting" to explore, it's not a pure algorithmic process. That said, yes, it does use a genetic algorithm (NEAT) under the hood, but I think what the authors are suggesting is that the key difference isn't whether it's genetic or gradient based optimization... they're getting at the difference in objective driven vs. open-ended search.

I think the main position / takeaway from the paper is that something about conventional SGD training produces these "fractured entangled representations" that work but are not well structured internally so they're hard to build on top of. They look at things like the curriculum / order things are learned in, objective search vs. open-ended search, etc...

Jony Ive's LoveFrom helped design Rivian's first electric bike

Michigan triples waters with 'Do Not Eat' warning for PFAS in fish

Dear High Schoolers, Time Is Precious

Show HN: Bridgit – In-Person-First Networking

Understanding MCP Evals: Why Evals Matter for MCP

Let's Learn About MCP Together

Higher education is shockingly right-wing

Photographing a City That Stopped Changing: A Decade of Suburban Decay

Show HN: I built an AI that helps you chat with and visualize your codebase

University of Michigan using undercover investigators to surveil Gaza protestors

Food additive titanium dioxide likely has more toxic effects than thought

I Built an AI Agent with Gmail Access and Discovered a Security Hole

Linux Foundation Announces the Fair Package Manager Project

Bonobara – Data Aggregation and Analysis Engineer

Bonobara – REST API Integration Developer

DIY bruxism detector prevents jaw clenching during sleep

Justices Grant Doge Access to Social Security Data

GPU Memory Consistency: Specs, Testing, and Opportunities for Perf Tooling

The Furthest Points from Any Ocean

You need to care about Product

Buyer with Ties to Chinese Communist Party Got VIP Treatment at Crypto Dinner

Wiregrass Archives launches interactive map for Alabama historical markers

These are the leading science and technology hotspots

Increased Toxicity Risk Identified for Children with ADHD, Autism

What Explains Today's Trade Tensions?

Ask HN: What would you work on if you couldn't fail?

What "Working" Means in the Era of AI Apps

My science teacher created a Wordle-like game all on his own

Formal Methods Tutorials – FizzBee

I Read All of Cloudflare's Claude-Generated Commits