frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I unified convolution and attention into a single framework

https://zenodo.org/records/17103133
17•umjunsik132•5h ago

Comments

umjunsik132•5h ago
Hi HN, author here. For years, it bothered me that convolution (the king of vision) and matrix multiplication / self-attention (the engine of Transformers) were treated as completely separate, specialized tools. It felt like we were missing a more fundamental principle. This paper is my attempt to find that principle. I introduce a framework called GWO (Generalized Windowed Operation) that describes any neural operation using just three simple, orthogonal components: Path: Where to look Shape: What form to look for Weight: What to value Using this "grammar", you can express both a standard convolution and self-attention, and see them as just different points in the same design space. But the most surprising result came when I analyzed operational complexity. I ran an experiment where different models were forced to memorize a dataset (achieving ~100% training accuracy). The results were clear: complexity used for adaptive regularization (like in Deformable Convolutions, which dynamically change their receptive field) resulted in a dramatically smaller generalization gap than "brute-force" complexity (like in Self-Attention). This suggests that how an operation uses its complexity is more important than how much it has. I'm an independent researcher, so getting feedback from a community like this is invaluable. I'd love to hear your thoughts and critiques. Thanks for taking a look. The paper is here: https://doi.org/10.5281/zenodo.17103133
iFire•58m ago
How is it different than https://en.wikipedia.org/wiki/Mamba_(deep_learning_architect...
umjunsik132•44m ago
That's a fantastic question, and you've hit on a perfect example of the GWO framework in action. The key difference is the level of abstraction: GWO is a general grammar to describe and design operations, while Mamba is a specific, highly-engineered model that can be described by that grammar. In fact, as I mention in the paper, we can analyze Mamba using the (P, S, W) components: Path (P): A structured state-space recurrence. This is a very sophisticated path designed to efficiently handle extremely long-range dependencies, unlike a simple sliding window or a dense global matrix. Shape (S): It's causal and 1D. It processes information sequentially, respecting the nature of time-series or language data. Weight (W): This is Mamba's superpower. The weights are highly dynamic and input-dependent, controlled by its selective state parameters. This creates an incredibly efficient, content-aware information bottleneck, allowing the model to decide what to remember and what to forget based on the context. So, Mamba isn't a competitor to the GWO theory; it's a stellar example of it. It's a brilliant instance of "Structural Alignment" where the (P, S, W) configuration is perfectly tailored for the structure of sequential data. Thanks for asking this, it's a great point for discussion.

SkiftOS: A hobby OS built from scratch using C/C++ for ARM, x86, and RISC-V

https://skiftos.org
211•ksec•7h ago•33 comments

UTF-8 is a brilliant design

https://iamvishnu.com/posts/utf8-is-brilliant-design
616•vishnuharidas•17h ago•249 comments

How 'overworked, underpaid' humans train Google's AI to seem smart

https://www.theguardian.com/technology/2025/sep/11/google-gemini-ai-training-humans
21•Brajeshwar•41m ago•4 comments

Java 25's new CPU-Time Profiler (1)

https://mostlynerdless.de/blog/2025/06/11/java-25s-new-cpu-time-profiler-1/
43•SerCe•4h ago•0 comments

How to Use Claude Code Subagents to Parallelize Development

https://zachwills.net/how-to-use-claude-code-subagents-to-parallelize-development/
109•zachwills•3d ago•56 comments

Weird CPU architectures, the MOV only CPU (2020)

https://justanotherelectronicsblog.com/?p=771
33•v9v•4d ago•2 comments

QGIS is a free, open-source, cross platform geographical information system

https://github.com/qgis/QGIS
448•rcarmo•19h ago•109 comments

Raspberry Pi Synthesizers – How the Pi is transforming synths

https://www.gearnews.com/raspberry-pi-synthesizers-how-the-pi-is-transforming-synths/
72•zdw•8h ago•37 comments

Many hard LeetCode problems are easy constraint problems

https://buttondown.com/hillelwayne/archive/many-hard-leetcode-problems-are-easy-constraint/
535•mpweiher•21h ago•450 comments

Does All Semiconductor Manufacturing Depend on Spruce Pine Quartz? (2024)

https://www.construction-physics.com/p/does-all-semiconductor-manufacturing
11•colinprince•3d ago•1 comments

FFglitch, FFmpeg fork for glitch art

https://ffglitch.org/gallery/
216•captain_bender•14h ago•31 comments

AI Coding

https://geohot.github.io//blog/jekyll/update/2025/09/12/ai-coding.html
141•abhaynayar•2h ago•88 comments

The treasury is expanding the Patriot Act to attack Bitcoin self custody

https://www.tftc.io/treasury-iexpanding-patriot-act/
698•bilsbie•1d ago•502 comments

Resizing images in Rust, now with EXIF orientation support

https://alexwlchan.net/2025/create-thumbnail-is-exif-aware/
38•ingve•4d ago•13 comments

Social media promised connection, but it has delivered exhaustion

https://www.noemamag.com/the-last-days-of-social-media/
175•pseudolus•5h ago•123 comments

Life, work, death and the peasant: Rent and extraction

https://acoup.blog/2025/09/12/collections-life-work-death-and-the-peasant-part-ivc-rent-and-extra...
215•baud147258•10h ago•96 comments

I used standard Emacs extension-points to extend org-mode

https://edoput.it/2025/04/16/emacs-paradigm-shift.html
165•Karrot_Kream•15h ago•18 comments

Tips for installing Windows 98 in QEMU/UTM

https://sporks.space/2025/08/28/tips-for-installing-windows-98-in-qemu-utm/
96•Bogdanp•13h ago•19 comments

EU court rules nuclear energy is clean energy

https://www.weplanet.org/post/eu-court-rules-nuclear-energy-is-clean-energy
848•mpweiher•17h ago•738 comments

Meow: Yet another modal editing on Emacs

https://github.com/meow-edit/meow
99•Bogdanp•11h ago•16 comments

OCI Registry Explorer

https://oci.dag.dev/
66•jcbhmr•9h ago•7 comments

3D modeling with paper

https://www.arvinpoddar.com/blog/3d-modeling-with-paper
291•joshuawootonn•21h ago•45 comments

I unified convolution and attention into a single framework

https://zenodo.org/records/17103133
17•umjunsik132•5h ago•3 comments

Behind Kamathipura's Closed Doors

https://failedarchitecture.com/behind-kamathipuras-closed-doors/
11•tsaifu•3d ago•1 comments

Close the loop: analytics that teach your chatbot to fix itself

https://www.hoverbot.ai/blog/close-the-loop-analytics-that-teach-your-chatbot-to-fix-itself
5•hoverbot•3d ago•1 comments

Legal win

https://ma.tt/2025/09/legal-win/
189•pentagrama•10h ago•155 comments

Reduce bandwidth costs with dm-cache: fast local SSD caching for network storage

https://devcenter.upsun.com/posts/cut-aws-bandwidth-costs-95-with-dm-cache/
60•tlar•3d ago•18 comments

Chatbox app is back on the US app store

https://github.com/chatboxai/chatbox/issues/2644
50•themez•9h ago•22 comments

How FOSS Projects Handle Legal Takedown Requests

https://f-droid.org/2025/09/10/how-foss-projects-handle-legal-takedown-requests.html
130•mkesper•18h ago•12 comments

Corporations are trying to hide job openings from US citizens

https://thehill.com/opinion/finance/5498346-corporate-america-has-been-trying-to-hide-job-opening...
570•b_mc2•19h ago•421 comments