GD-Attention is a provably nonlinear attention mechanism with: ・ No softmax ・ No averaging ・ A unique semantic jump point $s^*$
Verified independently by Gemini, GitHub Copilot, and GPT-4. → Softmax isn't just suboptimal — it's structurally incapable of what this model does.
GhostDrift•3h ago
GD-Attention is a provably nonlinear attention mechanism with: ・ No softmax ・ No averaging ・ A unique semantic jump point $s^*$
Verified independently by Gemini, GitHub Copilot, and GPT-4. → Softmax isn't just suboptimal — it's structurally incapable of what this model does.