frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Mental bugs due to lack of imagination

https://nahurst.substack.com/p/mental-bugs-due-to-lack-of-imagination
1•nathanh•1m ago•0 comments

Show HN: Formal Verification with Lean

https://www.daniellowengrub.com/blog/2026/04/30/lean
1•lowdanie•5m ago•0 comments

Digital Twin – An AI Clone of Yourself (Claude and ElevenLabs and Cloudflare)

https://aimirrortwin.com
1•sumhead•5m ago•0 comments

Zig vs. Rust in 2026

https://zackoverflow.dev/writing/zig-vs-rust-in-2026/
1•ibobev•7m ago•0 comments

Microsoft and Apple bets on new mascots in bid to seem more cuddly

https://www.bbc.com/news/articles/c99l1zzp8xzo
1•reconnecting•8m ago•0 comments

Kicking the Tyres on Harbor for Agent Evals

https://rmoff.net/2026/04/09/kicking-the-tyres-on-harbor-for-agent-evals/
1•eigenBasis•9m ago•0 comments

There's a $50B company hiding inside Salesforce

1•emmanol•9m ago•0 comments

Recursant, the open source AI control plane, now supports OpenClaw

https://clawhub.ai/plugins/openclaw-recursant
1•hestefisk•10m ago•0 comments

From latency to instant: Modernizing GitHub Issues navigation performance

https://github.blog/engineering/architecture-optimization/from-latency-to-instant-modernizing-git...
1•Brajeshwar•10m ago•0 comments

Ask HN: What AI tools are you using every day?

1•tomchui157•11m ago•3 comments

Introducing Spend Caps (Google Cloud)

https://cloud.google.com/blog/topics/cost-management/introducing-spend-caps-ai-cost-visibility-ne...
1•markerbrod•14m ago•0 comments

Check Your Fucking Sources, People

https://brodzinski.com/2026/05/check-fcking-sources.html
2•flail•15m ago•0 comments

Our response to the TanStack NPM supply chain attack

https://openai.com/index/our-response-to-the-tanstack-npm-supply-chain-attack/
1•taubek•15m ago•0 comments

The SGI Buyer's Guide

https://hardware.majix.org/computers/sgi/buyers-guide.shtml
1•uticus•17m ago•0 comments

Crypto-Agility Is a Runtime Property, Not a Compliance Checkbox

https://mayckongiovani.substack.com/p/pqc-engineering-series-deep-dive-8f2
1•doomhammerhell•18m ago•0 comments

C++26: Standard Library Hardening

https://www.sandordargo.com/blog/2026/05/13/cpp26-library-hardening
1•ibobev•19m ago•0 comments

ASCII by Jason Scott

https://ascii.textfiles.com/
1•bookofjoe•20m ago•0 comments

Zerodep (2023)

https://philipbohun.com/blog/0003.html
1•vinhnx•22m ago•0 comments

Mkjwk: Simple JSON Web Key Generator

https://mkjwk.org/
2•mooreds•22m ago•0 comments

C++26 Shipped a SIMD Library Nobody Asked For

https://lucisqr.substack.com/p/c26-shipped-a-simd-library-nobody
2•ibobev•22m ago•0 comments

The HTML Review 05

https://thehtml.review/05/
1•surprisetalk•22m ago•0 comments

So-tell-us.com – Family and Friends Newsletter

https://so-tell-us.com/
1•richardvc251•24m ago•1 comments

Show HN: Claurst – Rust-Based OSS Terminal Coding Agent Now in Beta

https://github.com/kuberwastaken/claurst
1•kuberwastaken•25m ago•0 comments

X for You Feed Algorithm (Updated May 15th)

https://github.com/xai-org/x-algorithm/blob/main/README.md
1•M4v3R•27m ago•0 comments

Anyone accepted crypto payments from customers?

1•Davida_Ginter•27m ago•1 comments

The U.S. has 1,200 AI bills and no good test for any of them

https://fortune.com/2026/05/15/ai-policy-patchwork-state-federal-regulation-framework-sonnenfeld-...
3•Brajeshwar•28m ago•0 comments

Code Review Is Not About Catching Bugs

https://www.davidpoll.com/2026/02/code-review-is-not-about-catching-bugs/
1•mooreds•30m ago•0 comments

Lessons Learned Building High-Performance Rust Profiler

https://pawelurbanek.com/rust-performance-profiling
1•vinhnx•31m ago•0 comments

Sorry seems to be the most overused word

https://amyhupe.co.uk/articles/sorry-seems-to-be-the-most-overused-word/
2•mooreds•31m ago•0 comments

Show HN: Vibe Coding a $20k /Year Enterprise Logistics Platform

https://trmnl.com/blog/vibe-coding-shiphero
5•ryanckulp•31m ago•0 comments
Open in hackernews

#1 on the leading AI memory benchmark using a smaller, cheaper model

https://exabase.io/research/exabase-achieves-state-of-the-art-on-longmemeval-benchmark
4•johnnymakes•1h ago

Comments

johnnymakes•1h ago
Hey HN. I'm Johnny, founder of Exabase.

M-1 is our first-generation memory engine. We evaluated it against LongMemEval, the most comprehensive public benchmark for conversational memory retrieval: 500 questions, ~115k tokens of history, relevant information scattered across sessions and buried in noise.

M-1 scored 96.4% at top-50 retrieval, the highest reported score, with consistent performance across all top-k's. The most interesting part is that we did it with Gemini 3 Flash, while every other system on the leaderboard used Gemini 3 Pro.

A bigger model can compensate for weaker retrieval – absorbing a larger, noisy context at the cost of increased inference. We deliberately chose a smaller model to isolate retrieval quality from model capability and solve for real, production use. This result is Pareto optimal: cheaper and better performance, which is what we're solving for.

Our results are in the spirit of real, production use – so we used a single generic prompt for our answerer – stripping out the question-specific prompt language we observed in other benchmark attempts/runners. The methodology, prompt, and full results JSON are all linked in the research post.

The research post also has a discussion of the evaluation ceiling we hit at this accuracy level (there are errors in the benchmark itself which create a noise floor – we reported a few upstream to the benchmark creator).

Happy to discuss the architecture, methodology, or how we think about memory retrieval differently!