frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

A all CLIs tokens and context reducer by 97%

https://www.squeezr.es/
1•sergioramosv•55s ago•1 comments

How we feel about AI (2025)

https://goauthentik.io/blog/2025-12-10-how-we-really-feel-about-ai/
1•walterbell•4m ago•0 comments

Show HN: Gecit – DPI bypass using eBPF sock_ops, no proxy or VPN

https://github.com/boratanrikulu/gecit
1•boratanrikulu•5m ago•0 comments

How to Get Better at Guitar

https://www.jakeworth.com/posts/how-to-get-better-at-guitar/
1•jwworth•6m ago•0 comments

Iran internet blackout now longest nation-scale shutdown on record

https://mastodon.social/@netblocks/116350984373909468
1•ukblewis•9m ago•0 comments

Show HN: Stablemount, a response to EmDash, a prototype for a future CMS

https://github.com/jhyolm/stablemount
1•jhyolm•9m ago•1 comments

Watch 'S4 – The Bob Lazar Story' online: Here's where to watch the UFO doc

https://www.tomsguide.com/entertainment/streaming/watch-s4-the-bob-lazar-story-online
1•evo_9•11m ago•0 comments

Show HN: YardSard – Inventory Management

https://apps.apple.com/us/app/yardsard/id6759114903
2•prithsr•16m ago•0 comments

Show HN: Imladri – Cryptographic enforcement and semantic monitoring for your AI

https://imladri.com/
2•osama872•18m ago•0 comments

AST vs. Bytecode: Interpreters in the Age of Meta-Compilation [pdf]

https://stefan-marr.de/downloads/oopsla23-larose-et-al-ast-vs-bytecode-interpreters-in-the-age-of...
3•tosh•19m ago•0 comments

Codex is switching to API pricing based usage for all users

https://help.openai.com/en/articles/20001106-codex-rate-card
5•ccmcarey•22m ago•1 comments

Francis Li

https://furclick.top/
2•menshowlee•28m ago•0 comments

Show HN: Regression-dog – A 20-line skill that reviews your code for regressions

https://github.com/imaman/skills/tree/main/skills/regression-dog
2•itay-maman•31m ago•0 comments

OpenRockets Archive New Submission(Autoscript)

https://archive.openrockets.com/Litha2024-main/
2•openrockets•32m ago•1 comments

Open source voice cloning TTS models worth trying

https://firethering.com/open-source-tts-voice-cloning/
2•steveharing1•32m ago•0 comments

Claude AI powered trading bot turns $1 into $3.3M on Polymarket

https://finbold.com/claude-ai-powered-trading-bot-turns-1-into-3-3-million-on-polymarket/
3•madaxe_again•34m ago•0 comments

Microsoft terms say Copilot is for entertainment purposes only, not serious use

https://www.tomshardware.com/tech-industry/artificial-intelligence/microsoft-says-copilot-is-for-...
37•jatins•35m ago•3 comments

We are facing the most significant days and weeks in world history since 1945

https://www.taxresearch.org.uk/Blog/2026/04/05/we-are-facing-most-significant-days-and-weeks-in-w...
2•only_in_america•35m ago•1 comments

iCloud appears to be down for some users

https://www.reddit.com/r/iCloud/s/GAahHRBNPX
2•FinnKuhn•40m ago•3 comments

Computational Physics (2nd Edition)

https://websites.umich.edu/~mejn/cp2/
3•teleforce•42m ago•0 comments

Open Source Elixir Personel Health Management

https://github.com/joestein/health-pilot
3•buoewe•43m ago•0 comments

Agile Development Is Dead Reckoning

https://paolog.net/posts/dead-reckoning-agile/
1•paologi•49m ago•0 comments

Inference Arena – new benchmark of local inference and training

http://kvark.github.io/ai/performance/2026/04/04/inference-arena.html
3•kvark•50m ago•1 comments

Show HN: Identa – CLI to calibrate prompts across local LLMs

3•srodriguezp•50m ago•0 comments

The Disposable Tools Manifesto

https://blog.vtemian.com/post/disposable-tools-manifesto/
3•vtemian•51m ago•0 comments

Dh

1•tegjm•51m ago•0 comments

StackOverflow: Retiring the Beta Site

https://meta.stackoverflow.com/questions/438628/retiring-the-beta-site
17•stefankuehnel•53m ago•6 comments

Background Jobs in Go with Asynq and Valkey

https://josephgoksu.com/blog/background-jobs-in-go-asynq-valkey/
2•ssfak•53m ago•0 comments

Developers using LLM APIs, what are your biggest frustrations?

https://form.jotform.com/260943627372058
1•Algo-bro•54m ago•0 comments

Program analysis using random interpretation (2005) [pdf]

https://sigplan.org/Awards/Dissertation/2005_gulwani.pdf
1•azhenley•55m ago•0 comments
Open in hackernews

Show HN: 1B Embeddings

3•INVARIAN•4h ago
We built a vector search engine based on Quantized Tensor Train (QTT) decomposition. Instead of approximate nearest neighbor (ANN) indices like HNSW or IVF, we factorize the entire dataset into a compressed tensor format and serve exact cosine similarity queries directly from the compressed representation. The headline: 1 billion vectors on a single H100, 38ms query, 100% recall, 66 GB serving.

Recall improves with scale at fp16: 96% at 400M → 98% at 500M → 99% at 600M → 100% at 1B. This is the opposite of ANN indices, where recall degrades with scale. More data helps the decomposition converge.

Every number below is measured, not projected. Full benchmark suite across 4 GPUs at 3 precision tiers. H100 80GB, 384-dim embeddings, rank=32.

fp16 (Scale tier) — H100 80GB:

  100M:  5.87ms p50,  6.6 GB serving, 100% recall, 46.5x compression
  500M: 20.54ms p50, 33.0 GB serving,  98% recall, 46.5x compression
    1B: 38.51ms p50, 66.0 GB serving, 100% recall, 46.5x compression
fp32 (Production tier) — H100 80GB: 100M: 18.96ms p50, 13.2 GB serving, 100% recall, 23.3x compression 300M: 46.29ms p50, 39.6 GB serving, 100% recall, 23.3x compression 500M: 76.53ms p50, 66.0 GB serving, 100% recall, 23.3x compression fp64 (Exact tier) — H100 80GB: 10M: 2.68ms p50, 2.6 GB serving, 100% recall, 11.6x compression 100M: 20.40ms p50, 26.4 GB serving, 100% recall, 11.6x compression 200M: 40.40ms p50, 51.6 GB serving, 100% recall, 11.6x compression

Hardware portability — same codebase, different GPUs: P4000 8 GB: fp16 50M, 26ms p50, 100% recall, $0.07/hr A100 40 GB: fp16 200M, 3.1ms p50, 98% recall, $0.70/hr H100 80 GB: fp16 500M, 3.1ms p50, 98% recall, $2.09/hr B200 192 GB: fp16 500M, 1.4ms p50, 99% recall, est. ~$5/hr Recall is hardware-invariant. Same math, same results, P4000 through H100. The 2B run on B200 is in progress. How it works: The dataset X (N×D) is factored as X ≈ Z · V_T where Z is (N×r) and V_T is (r×D), with r=32. Query is a single GEMV: scores = Z · (V_T · q). Bytes per entry: 2r bytes at fp16 = 64 bytes regardless of embedding dimension. A 1536-dim OpenAI ada-002 embedding compresses 23.6x at fp32 with zero recall loss.

Compression is dimension-independent:

  384-dim  MiniLM:     11.6x, 100% recall
  768-dim  E5-large:   11.8x, 100% recall
  1024-dim Cohere v3:  15.8x, 100% recall
  1536-dim ada-002:    23.6x, 100% recall
Operational details (H100, 10M vectors): QPS: 317 single client, 183 at 100 concurrent Cold start: 8.88s from snapshot to first query 24h soak: 2.9M queries, 8.6M inserts, zero data corruption Insert-under-query: 885 inserts/s concurrent with 101 QPS All artifacts (JSON + logs) available Build uses streaming randomized SVD — peak VRAM equals serving size, not dataset size. The 2B run on B200 uses streaming coefficient regeneration so the 512 GB coefficient matrix is never fully allocated in RAM. brad@holonomx.com

Comments

INVARIAN•3h ago
B200 2B fp16 — complete.

Metric Value

N 2,000,000,000

GPU NVIDIA B200 (191.5 GB)

Build 1794.2s (Pass 1: 637s, Pass 2: 1157s)

Serving 132.00 GB (Z=[2B, 32] fp16)

Compression 46.5× fp64, 23.3× fp32

Query p50 60.89 ms

Query p99 62.58 ms

R@10 mean 98.0%

R@10 min 90.0%

VRAM serving 131 GB

VRAM query peak 142 GB

CPU RAM post 9 GB

Total wall 2693.6s (~45 min)

INVARIAN•3h ago
2.5B: Incoming