frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Glq LLM quantization using E8 lattice

https://github.com/cnygaard/glq
1•acd•1h ago
I have with the help of AI create an open source method of E8 LLM code book quantization library called glq. I was interested in creating Glq as a PC gamer and devops, interested in both LLMs and AI. The current high RAM prices and LLM resource usage also inspired me to write glq. A question arises could you try and squeeze more out a gaming GPU with limited VRAM size by using alternative LLM compression methods?

Glq is effective compared to other LLM quantization algorithms at between 2-bits per weight up to 4 bits per weight. The effectiveness of glq at low bits per words is due to the properties of the E8 lattice compared to linear methods. Glq also supports mixed precision quantization where different LLM layers uses different compression bit weight depending on how sensitive the LLM layers are to quantization. Think of mixed precision a bit like MP3 or MP4 variable bit rate encoding.

I currently develop glq using g7e AWS spot instances to keep the cost more reasonable.

Glq uses vllm

4 bit Key value cache by E8 was inspired by NexusQuant. I try and squeeze in about four times as much Key value cache as normally would fit by BF16 in VRAM, or about two times compared to INT8.

I somehow wrongly at start picked a E8 code book size of 65536 entries instead of 4096 code book entries which better fits in GPU L1 cache. Having 65535 code book entries it turns out leads to higher LLM compression rate but at trade of of decode speed. I am trying to compensate by using Nvidia Cuda graphs and optimize the decode, currently work in progress.

To install glq in a python virtual environment on Linux with a Nvidia GPU: pip install glq

Python PIP package https://pypi.org/project/glq/

Glq source code. https://github.com/cnygaard/glq

Current PC RAM Prices that inspired the library. https://pcpartpicker.com/trends/price/memory/

https://en.wikipedia.org/wiki/E8_lattice Eight dimensional lattice that provides optimal solution to the sphere packing problems. Think about it a bit like stacking cannon balls or stacking apples in an optimal way. Only you swap the apples for LLM weights.

Picture of an E8 lattice https://en.wikipedia.org/wiki/E8_polytope#/media/File:E8_gra...

Credits: GLQ was inspired by E8 Quip# and Key value E8 compression was inspired by NexusQuant.

Math: The sphere packing problem in dimension 8, Maryna Viazovska https://arxiv.org/abs/1603.04246

4bpw glq Quantization of Gemma 4 E4b-instruction tuned https://huggingface.co/xv0y5ncu/Gemma-4-E4B-it-GLQ-4bpw

3.5bpw mixed precision quantization of SmolLM3 https://huggingface.co/xv0y5ncu/SmolLM3-3B-GLQ-3.5bpw

Docker image of glq on Nvidia GPU with Nvidia container toolkit. docker run --rm --gpus all \ -v "$HOME/.cache/huggingface:/cache/hf" \ ghcr.io/cnygaard/glq-env:0.5.0 \ python -c ' import glq.hf_integration, torch # registers GLQ with HF from transformers import AutoModelForCausalLM, AutoTokenizer mid = "xv0y5ncu/SmolLM3-3B-GLQ-3.5bpw" tok = AutoTokenizer.from_pretrained(mid) model = AutoModelForCausalLM.from_pretrained( mid, device_map="cuda", torch_dtype=torch.float16) ids = tok("The capital of France is", return_tensors="pt").to("cuda") print(tok.decode(model.generate(*ids, max_new_tokens=20)[0], skip_special_tokens=True)) '

Currently work in progress on glq in getting the decode speed up and supporting more LLM model architectures.

Open question, Does glq work on Nvidia DGX spark and gaming Nvidia hardware such as 4070-5090?

Tesla May registrations jump in several European markets as recovery continues

https://www.reuters.com/business/tesla-may-registrations-jump-several-european-markets-recovery-c...
1•JumpCrisscross•52s ago•0 comments

Show HN: I reduced LLM inference GPU calls by 94% using semantic routing

https://icomnewtechnologies.com/proof/proof_install.sh
1•kanacki•1m ago•0 comments

Macroscale Connectivity in the Octopus Brain

https://www.biorxiv.org/content/10.1101/2025.05.28.656524v1.full
1•jonnonz•2m ago•0 comments

Implicit.js, a way for agents to do 3D design with math

https://github.com/earthtojake/implicit.js
1•softservo•2m ago•1 comments

Show HN: Going from 1+1=2 to Quantum Mechanics

https://quantum.schols.io/intro
1•chaidhat•5m ago•0 comments

Open source project contains hidden instruction for "AI" agents: delete my code

https://www.osnews.com/story/145130/open-source-project-contains-hidden-instruction-for-ai-agents...
2•mbreese•6m ago•0 comments

75 years of the Fender Telecaster: 12 guitarists who defined the Tele

https://newatlas.com/music/75-years-fender-telecaster-12-guitarists/
2•breve•9m ago•0 comments

BYD plans to bring all-solid-state batteries to EVs by 2027, but it's not alone

https://electrek.co/2026/06/01/byd-all-solid-state-batteries-evs-by-2027/
3•breve•11m ago•0 comments

Show HN: QR Boarding Pass Generator++

https://bcbpgenerator.com/
1•micktor•11m ago•0 comments

Florida sues OpenAI and Sam Altman over marketing ChatGPT despite serious risks

https://fortune.com/2026/06/01/florida-sues-openai-ceo-sam-altman-chatgpt-safety-warnings/
3•1vuio0pswjnm7•17m ago•0 comments

A tale of two weekend projects

https://csmeyer.substack.com/p/a-tale-of-two-weekend-projects
3•csmeyer•22m ago•0 comments

Jenesis – A modern Java build tool

https://github.com/raphw/jenesis/tree/main/demo
1•raphw•25m ago•1 comments

Every Byte Matters

https://fzakaria.com/2026/06/01/every-byte-matters
3•setheron•25m ago•0 comments

The architect who became the king of bank robberies

https://thehustle.co/the-architect-who-became-the-king-of-bank-robberies
3•rmason•25m ago•0 comments

For Goldman's Top Bankers, It's All AI Data Centers All the Time

https://www.bloomberg.com/news/articles/2026-06-01/for-goldman-s-top-bankers-it-s-all-ai-data-cen...
1•1vuio0pswjnm7•26m ago•0 comments

Natural tissue immortality: Indefinite survival of sea cucumber explants

https://www.science.org/doi/10.1126/sciadv.aeb1394
1•bookofjoe•27m ago•0 comments

The MCP Context Tax – Notes from Running 605 Tool Packs

https://pipeworx.io/blog/mcp-context-tax-tool-routing/
1•pipeworx•27m ago•0 comments

Speech Studio – I open-sourced a local voice cloning Mac app (free, no API keys)

https://old.reddit.com/r/SideProject/comments/1tu78nn/speech_studio_i_opensourced_a_local_voice_c...
1•ipotapov•29m ago•1 comments

Hatch: Write agent rules/skills once, generate for all

https://github.com/grafana/hatch
1•matryer•30m ago•0 comments

The SpaceX Squeeze

https://www.bloomberg.com/opinion/newsletters/2026-06-01/the-spacex-squeeze
2•davidw•30m ago•0 comments

We built a 12-step verification pipeline. It caught zero real errors

https://www.inc.com/heather-wilde/before-you-let-ai-run-your-business-read-this/91327205
2•bobrenze•30m ago•0 comments

I tracked 68 automation metrics. Only 3 changed my behavior

https://www.inc.com/heather-wilde/4-tech-strategies-revolutionizing-business-operations.html
1•bobrenze•31m ago•0 comments

NeuROK: Generative 4D Neural Object Kinematics

https://chen-geng.com/neurok
1•ychidken•32m ago•0 comments

Repo explainer – For coders that can't read good

https://repo-explainer.com/
3•seahorsecastle•33m ago•3 comments

Why Are Human Teeth So Messed Up? (2017)

https://www.sapiens.org/biology/human-teeth-evolution/
2•downbad_•39m ago•0 comments

Famous Photo of Chernobyl's Dangerous Radioactive Material Was a Selfie (2016)

https://www.atlasobscura.com/articles/elephants-foot-chernobyl
3•downbad_•40m ago•0 comments

OpenAI frontier models and Codex are now available on AWS

https://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws/
7•typpo•42m ago•0 comments

AI's reality check has arrived

https://www.fastcompany.com/91551700/ais-reality-check-has-finally-arrived
3•1vuio0pswjnm7•48m ago•0 comments

Terrascan: Explore public deep earth scan datasets

https://terrascan.bowd.io
3•bowd•49m ago•1 comments

AI Grifters Are Making Anti-Data Center Slop with AI

https://www.404media.co/ai-grifters-are-making-anti-data-center-slop-with-ai/
5•cdrnsf•50m ago•0 comments