I unified convolution and attention into a single framework

17•umjunsik132•5h ago

Comments

umjunsik132•5h ago

Hi HN, author here. For years, it bothered me that convolution (the king of vision) and matrix multiplication / self-attention (the engine of Transformers) were treated as completely separate, specialized tools. It felt like we were missing a more fundamental principle. This paper is my attempt to find that principle. I introduce a framework called GWO (Generalized Windowed Operation) that describes any neural operation using just three simple, orthogonal components: Path: Where to look Shape: What form to look for Weight: What to value Using this "grammar", you can express both a standard convolution and self-attention, and see them as just different points in the same design space. But the most surprising result came when I analyzed operational complexity. I ran an experiment where different models were forced to memorize a dataset (achieving ~100% training accuracy). The results were clear: complexity used for adaptive regularization (like in Deformable Convolutions, which dynamically change their receptive field) resulted in a dramatically smaller generalization gap than "brute-force" complexity (like in Self-Attention). This suggests that how an operation uses its complexity is more important than how much it has. I'm an independent researcher, so getting feedback from a community like this is invaluable. I'd love to hear your thoughts and critiques. Thanks for taking a look. The paper is here: https://doi.org/10.5281/zenodo.17103133

iFire•58m ago

How is it different than https://en.wikipedia.org/wiki/Mamba_(deep_learning_architect...

umjunsik132•44m ago

That's a fantastic question, and you've hit on a perfect example of the GWO framework in action. The key difference is the level of abstraction: GWO is a general grammar to describe and design operations, while Mamba is a specific, highly-engineered model that can be described by that grammar. In fact, as I mention in the paper, we can analyze Mamba using the (P, S, W) components: Path (P): A structured state-space recurrence. This is a very sophisticated path designed to efficiently handle extremely long-range dependencies, unlike a simple sliding window or a dense global matrix. Shape (S): It's causal and 1D. It processes information sequentially, respecting the nature of time-series or language data. Weight (W): This is Mamba's superpower. The weights are highly dynamic and input-dependent, controlled by its selective state parameters. This creates an incredibly efficient, content-aware information bottleneck, allowing the model to decide what to remember and what to forget based on the context. So, Mamba isn't a competitor to the GWO theory; it's a stellar example of it. It's a brilliant instance of "Structural Alignment" where the (P, S, W) configuration is perfectly tailored for the structure of sequential data. Thanks for asking this, it's a great point for discussion.

SkiftOS: A hobby OS built from scratch using C/C++ for ARM, x86, and RISC-V

UTF-8 is a brilliant design

How 'overworked, underpaid' humans train Google's AI to seem smart

Java 25's new CPU-Time Profiler (1)

How to Use Claude Code Subagents to Parallelize Development

Weird CPU architectures, the MOV only CPU (2020)

QGIS is a free, open-source, cross platform geographical information system

Raspberry Pi Synthesizers – How the Pi is transforming synths

Many hard LeetCode problems are easy constraint problems

Does All Semiconductor Manufacturing Depend on Spruce Pine Quartz? (2024)

FFglitch, FFmpeg fork for glitch art

AI Coding

The treasury is expanding the Patriot Act to attack Bitcoin self custody

Resizing images in Rust, now with EXIF orientation support

Social media promised connection, but it has delivered exhaustion

Life, work, death and the peasant: Rent and extraction

I used standard Emacs extension-points to extend org-mode

Tips for installing Windows 98 in QEMU/UTM

EU court rules nuclear energy is clean energy

Meow: Yet another modal editing on Emacs

OCI Registry Explorer

3D modeling with paper

I unified convolution and attention into a single framework

Behind Kamathipura's Closed Doors

Close the loop: analytics that teach your chatbot to fix itself

Legal win

Reduce bandwidth costs with dm-cache: fast local SSD caching for network storage

Chatbox app is back on the US app store

How FOSS Projects Handle Legal Takedown Requests

Corporations are trying to hide job openings from US citizens

I unified convolution and attention into a single framework

Comments

SkiftOS: A hobby OS built from scratch using C/C++ for ARM, x86, and RISC-V

UTF-8 is a brilliant design

How 'overworked, underpaid' humans train Google's AI to seem smart

Java 25's new CPU-Time Profiler (1)

How to Use Claude Code Subagents to Parallelize Development

Weird CPU architectures, the MOV only CPU (2020)

QGIS is a free, open-source, cross platform geographical information system

Raspberry Pi Synthesizers – How the Pi is transforming synths

Many hard LeetCode problems are easy constraint problems

Does All Semiconductor Manufacturing Depend on Spruce Pine Quartz? (2024)

FFglitch, FFmpeg fork for glitch art

AI Coding

The treasury is expanding the Patriot Act to attack Bitcoin self custody

Resizing images in Rust, now with EXIF orientation support

Social media promised connection, but it has delivered exhaustion

Life, work, death and the peasant: Rent and extraction

I used standard Emacs extension-points to extend org-mode

Tips for installing Windows 98 in QEMU/UTM

EU court rules nuclear energy is clean energy

Meow: Yet another modal editing on Emacs

OCI Registry Explorer

3D modeling with paper

I unified convolution and attention into a single framework

Behind Kamathipura's Closed Doors

Close the loop: analytics that teach your chatbot to fix itself

Legal win

Reduce bandwidth costs with dm-cache: fast local SSD caching for network storage

Chatbox app is back on the US app store

How FOSS Projects Handle Legal Takedown Requests

Corporations are trying to hide job openings from US citizens