frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I built a 2nd-order PyTorch optimizer for LLMs that runs on 16GB GPUs

2•dnosoz•1h ago
Hi HN,

I'm Danilo. I've been struggling with the limitations of AdamW when fine-tuning LLMs locally. Second-order optimizers (like Shampoo or SOAP) offer significantly better step-convergence by exploiting Kronecker-factored curvature. The problem? They require O(d^2) memory and O(d^3) compute per layer, which immediately OOMs consumer hardware like a 16GB T4 or RTX 3090.

I wanted Shampoo-quality preconditioning on my home setup, so I built SCAO (Sparse Curvature-Aware Optimizer).

It's a PyTorch optimizer that acts as a drop-in replacement for AdamW, but it implements a few strict architectural changes to survive on consumer cards:

1. Adaptive Rank Selection: Instead of full-rank Kronecker factors, it truncates the eigenspace to retain >=95% of spectral mass. 2. Int8 EMA Quantization: The curvature accumulators are stored in symmetric int8, which yields a 4x memory reduction with zero degradation in perplexity. 3. Quantization Stability: Standard Shampoo usually crashes at step 1 during 4-bit QLoRA fine-tuning due to SVD ill-conditioning in quantized spaces. SCAO exploits sparse approximations to bypass this. 4. Fused CUDA kernels: I wrote custom kernels to fix an O(k * m^2 * n) complexity bottleneck in the naive projection implementation.

The Benchmark: I recently ran a head-to-head benchmark on a single T4 (16GB VRAM) fine-tuning Qwen2.5-3B (4-bit QLoRA, rank 16): - Shampoo: Failed at Step 1 (SVD mathematical collapse). - SCAO: 100% stability, peaked at exactly 7.14 GB VRAM, with a smooth loss descent.

It is pip-installable (pip install scao).

I've written a technical report detailing the regret bounds, ablation studies, and scaling laws (published on Zenodo), but I really wanted to get this community's eyes on the CUDA kernels and the PyTorch implementation.

GitHub: https://github.com/whispering3/scao Technical Report (DOI): https://doi.org/10.5281/zenodo.19870556

I'd love any feedback, code roasts, or questions about the math behind it!

Comments

satvikpendem•1h ago
Your account is shadow banned by the way, I guess you've just been self promoting too much.
dnosoz•1h ago
Author here. Happy to answer any deep-dive questions about the CUDA implementation or the Kronecker factorization math.
lostmsu•51m ago
Does it actually improve time to target loss?

SpaceX to give Musk 200M shares if 1M colonists on Mars and $7.5B valuation

https://www.investors.com/news/elon-musk-spacex-pay-data-center-starship-tesla-billionaire/
1•delichon•33s ago•0 comments

New Gene Therapy Enables Children with a Rare Form of Deafness to Hear

https://www.nytimes.com/2026/04/23/science/deaf-gene-therapy.html
1•bookofjoe•55s ago•1 comments

Declarative Git repo sync/migration tool and self hosted code search engine

https://github.com/stepbrobd/miroir
1•StepBroBD•56s ago•1 comments

Fidelity Won't Let Fund Holders Donate to Southern Poverty Law Center

https://www.nytimes.com/2026/04/29/business/fidelity-southern-poverty-law-center.html
2•JumpCrisscross•2m ago•0 comments

Barman – Backup and Recovery Manager for PostgreSQL

https://github.com/EnterpriseDB/barman
1•nateb2022•2m ago•0 comments

Ghost-hunter – AI cloud cost investigator that never touches your cloud

https://github.com/avinash-matrixgard/ghosthunter
1•matrixgard•2m ago•0 comments

Digital dead man's switch: how it works and when to use one

https://blog.alcazarsec.com/posts/digital-dead-mans-switch-guide
1•alcazar•2m ago•0 comments

LLM-Audit – Semgrep Rules for OWASP LLM Top in TypeScript

https://github.com/Javierlozo/llm-audit
1•Javierlozo•3m ago•0 comments

When the Bill Comes Due

https://tedium.co/2026/04/28/openai-anthropic-ai-tools-expensive-alternatives/
1•Brajeshwar•3m ago•0 comments

Actual line in the official system prompt for Codex for GPT-5.5

https://bsky.app/profile/emollick.bsky.social/post/3mkjwmbebr22p
1•doener•4m ago•0 comments

Bit: An LLM in the browser that only answers yes or no

https://bit.simone.computer
2•syx•5m ago•1 comments

45800 tech employees laid off in March 2026 alone

https://layoffs.fyi/
3•rachid_O•5m ago•0 comments

The Triumph of the Data Raccoons

https://muddy.jprs.me/posts/2026-04-03-the-triumph-of-the-data-raccoons/
1•jprs•6m ago•0 comments

Social Media Cheet Sheet

https://www.branding5.com/tools/social-media-cheat-sheet
1•mnewme•6m ago•0 comments

Show HN: Apollo Data Auditor – GDPR/CCPA scanner, breach SIM, remediation

https://apollo.aiia-tech.com/en/
1•ggabriel2025•6m ago•0 comments

CodeThis – paste bin with Markdown, password, MCP, and code-to-image

https://codethis.dev/
1•Patrity•7m ago•0 comments

The Edge of Galaxy

https://planetos.substack.com/p/the-edge-of-galaxy-past-all-frontiers
1•deze333•9m ago•0 comments

Show HN: My retired dad and I made a daily, somewhat difficult, quiz

https://kviss.eu/
1•steinvakt2•13m ago•0 comments

AI Agents Know About Supabase. They Don't Always Use It Right

https://supabase.com/blog/supabase-agent-skills
1•andrewstetsenko•13m ago•0 comments

Show HN: Harness – Manage parallel Claude Code agents across Git worktrees

https://github.com/frenchie4111/harness
2•frenchie4111•14m ago•1 comments

Mesa: a versioned filesystem for agents

https://www.mesa.dev/blog/introducing-mesa-filesystem-for-agents
3•Anon84•14m ago•0 comments

Cordouan Lighthouse

https://en.wikipedia.org/wiki/Cordouan_Lighthouse
2•Petiver•16m ago•0 comments

Facebook Has a Health Scam Problem

https://www.nytimes.com/2026/04/27/well/facebook-supplements-health.html
3•cainxinth•17m ago•0 comments

Nvidia exec: 'The cost of compute is far beyond the costs of my employees'

https://fortune.com/2026/04/28/nvidia-executive-cost-of-ai-is-greater-than-cost-of-employees/
4•david-gpu•18m ago•1 comments

Premature Coherence

https://creader.io/publish/timtimtim/article/ten-years-toward-a-better-way-to-create
1•timothyshen123•21m ago•0 comments

Show HN: fixiproject.org – minimalist web tools

https://fixiproject.org
2•recursivedoubts•21m ago•0 comments

For the first time, more Americans are moving to Europe than vice-versa

https://xcancel.com/benbawan/status/2049303326999609846
3•vrganj•21m ago•0 comments

The Bloomberg Terminal Is Getting an AI Makeover

https://www.wired.com/story/the-bloomberg-terminal-is-getting-an-ai-makeover-like-it-or-not/
1•andsoitis•23m ago•0 comments

Photoshopping the Package

https://seths.blog/2026/04/photoshopping-the-package/
1•speckx•23m ago•1 comments

Cybersecurity in the Intelligence Age

https://openai.com/index/cybersecurity-in-the-intelligence-age/
1•Brajeshwar•24m ago•0 comments