frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Training LLMs on 1080 Tis without shadow weights

https://github.com/batteryphil/Primal-Discrete-LLM-Training
1•batteryphil•1h ago

Comments

batteryphil•1h ago
[Technical Explainer] Project PRIMAL: Shadowless 4-Bit TrainingI’ve spent the last few weeks trying to solve the "Shadow Weight Tax." Standard Quantization-Aware Training (QAT) is often a memory lie—it claims to be 4-bit, but it keeps a hidden FP32 master copy to accumulate gradients. This doubles VRAM requirements and kills the dream of training on consumer-grade legacy hardware like the GTX 1080 Ti.Project PRIMAL is an attempt to delete shadow weights entirely and train a 0.1B parameter model directly on a discrete integer grid.1. The Core Hack: The Prime-Harmonic GridInstead of linear INT4, I’ve implemented a 13-value grid derived from prime reciprocals ($\pm 1, \pm 1/2, \pm 1/3 \dots$).The Logic: This concentrates precision around zero, creating a "natural" bell curve that mimics the weight distribution of dense models without the overhead of floating-point math.Efficiency: Stored in 4-bit nibbles, this allows for high-precision "Fine" layers without the memory footprint of FP16.2. The "Poltergeist" OptimizerTo solve Stochastic Thrashing (the oscillation caused by gradients being smaller than the discrete weight steps), I developed Decoupled Flipping:Vote Buffering: Gradients cast a "Vote" into an int8 buffer rather than touching the weights.Consensus: We only flip a discrete bit once the buffer shows a strong net signal (e.g., +4 net votes over multiple micro-batches).Adaptive Probability: Weight magnitude determines "flip-resistance," stabilizing the model as it converges.3. Telemetry on a GTX 1080 Ti (11GB):Throughput: ~5,800 - 6,000 Tokens/Sec.VRAM Health: 10.37 GB (94% saturation) with zero leakage over 18+ hours.Semantic Emergence: The model has progressed from random noise to forming complex concepts like "evaluate video system damages" at Step 11,650.4. Process & AI DisclosureI am a solo researcher using AI as a force multiplier. I collaborated with Gemini 3 Flash for CUDA kernel refinement, documentation structuring, and logic stress-testing. The "Poltergeist" consensus logic and the Prime-Harmonic math are the core of the research, while AI assisted in accelerating the low-level implementation.

Google may be cracking down on self-promotional 'best of' listicles

https://searchengineland.com/google-cracking-down-self-promotional-best-of-listicles-468227
1•gnabgib•1m ago•0 comments

Show HN: Sovereign Suite – A Recursive Logic Framework for AI Governance

https://github.com/holland202/Sovereign-Suite-Manifest
1•badatchess•5m ago•0 comments

Show HN: New Open Source Agent with 62 Stars on GitHub

https://github.com/dakotalock/holygrailopensource
1•Moriarty2027•8m ago•0 comments

Mitchell Hashimoto Launches 'Vouch' to Fight AI Slop in Open Source Ecosystem

https://itsfoss.com/news/mitchell-hashimoto-vouch/
2•WaitWaitWha•8m ago•1 comments

Ethnic minorities are driving America's startup boom

https://www.economist.com/finance-and-economics/2026/02/12/ethnic-minorities-are-driving-americas...
1•andsoitis•10m ago•0 comments

Authoring, simulating, and testing dynamic human-AI group conversations

https://research.google/blog/beyond-one-on-one-authoring-simulating-and-testing-dynamic-human-ai-...
1•gmays•11m ago•0 comments

PostgreSQL v19: Password expiration warnings

https://hexacluster.ai/blog/postgresql-v19-password-expiration-warnings
1•avivallssa•14m ago•0 comments

Show HN: Khaos – Every AI agent I tested broke in under 30 seconds

1•exordex•16m ago•0 comments

How Are Amps Modeled? [video]

https://www.youtube.com/watch?v=9YL8pwF7Mnc
2•dsego•19m ago•0 comments

What 1.4M emails reveal about America's most notorious sex offender

https://www.economist.com/interactive/international/2026/02/12/inside-epsteins-network
1•doener•20m ago•0 comments

Simile: The Simulation Company

https://twitter.com/joon_s_pk/status/2022023097017421874
1•jaehong747•21m ago•0 comments

Elide is an all-in-one, AI-native, open source software runtime

https://elide.dev/
2•shirian•23m ago•0 comments

The March Cliff: Why the 2026 Economic Collapse Is Different

https://ramakanth-d.medium.com/the-march-cliff-why-the-2026-economic-collapse-is-different-e1c619...
1•playhard•25m ago•1 comments

Welcome to the Great Regression

https://www.bloomberg.com/opinion/newsletters/2026-02-12/the-us-risks-a-great-regression
1•petethomas•26m ago•0 comments

Judge rules that LLM provided legal advice is open to discovery [pdf]

https://storage.courtlistener.com/recap/gov.uscourts.nysd.652138/gov.uscourts.nysd.652138.22.0.pdf
2•stingrae•27m ago•0 comments

My hot take on vibe coding for PMs

https://www.ddmckinnon.com/2026/02/11/my-%f0%9f%8c%b6-take-on-vibe-coding-for-pms/
1•awaxman11•30m ago•0 comments

AI: Brainrot Inducer or Cognitive Multiplier?

https://www.cjroth.com/blog/2026-02-12-brainrot
1•thoughtfulchris•31m ago•0 comments

Deft – a class and interface system for Clojure[video]

https://www.youtube.com/watch?v=dlW6YzwUZ-M
1•sammy0910•31m ago•0 comments

AI and consciousness: from objective descriptions to 'level zero'

https://randomseed.io/txt/ai-and-consciousness/
1•siefca•32m ago•1 comments

Cloudflare adds real-time Markdown rendering for AI agents

https://blog.cloudflare.com/markdown-for-agents/
5•thestackfox•34m ago•2 comments

A Read-Only Philosophical Archive on Restraint and AI Ethics

https://coexilia.io/coexilian-documents/
1•aegissolis•34m ago•1 comments

RFK Jr. food pyramid site links to Grok, which says you shouldn't trust RFK Jr

https://arstechnica.com/health/2026/02/rfk-jr-food-pyramid-site-links-to-grok-which-says-you-shou...
3•doener•34m ago•2 comments

Skip the Tips: A game to select "No Tip" but dark patterns try to stop you

https://skipthe.tips/
4•randycupertino•34m ago•2 comments

Amazon's Ring cancels Flock partnership amid Super Bowl ad backlash

https://www.cnbc.com/2026/02/12/amazons-ring-cancels-flock-partnership-amid-super-bowl-ad-backlas...
2•zzzeek•38m ago•0 comments

Z-Image Implemented in NCNN Vulkan

https://github.com/nihui/zimage-ncnn-vulkan
2•luyu_wu•40m ago•0 comments

Show HN: I taught AI to remember. Then it warned me

https://github.com/Relic-Studios/ISSA-Repository
1•relicstudios•41m ago•0 comments

What happens when capability decouples from credentials?

2•falsework•41m ago•2 comments

Bryan Johnson's Immortals program costs $1M. How to DIY it <1% of the price

https://www.empirical.health/blog/bryan-johnson-immortals-program-diy/
1•brandonb•42m ago•0 comments

True, Relevant, and Wrong: The Applicability Problem in RAG

https://www.pinecone.io/learn/series/beyond-retrieval/rag-applicability-problem/
2•gk1•43m ago•0 comments

Coinbase Posts $667M Net Loss, Revenue Declines 20%

https://www.bloomberg.com/news/articles/2026-02-12/coinbase-posts-667-million-loss-sees-revenue-t...
5•petethomas•45m ago•0 comments