frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Spark-LLM-eval – Distributed LLM evaluation for Spark

https://github.com/bassrehab/spark-llm-eval
1•subhadipmitra•2h ago

Comments

subhadipmitra•2h ago
Hey HN, I built this because most LLM eval tools assume single-machine execution. When you need to evaluate against millions of examples (customer tickets, documents, etc.), they don't scale without significant duct-taping.

  spark-llm-eval runs natively on Spark - not "Spark as an afterthought" but distributed evaluation as the primary design goal.

  Key features:
  - Distributed inference via Pandas UDFs, scales linearly with executors
  - Statistical rigor by default: bootstrap CIs, paired t-tests, effect sizes
  - Multi-provider: OpenAI, Anthropic, Gemini, vLLM
  - Delta Lake integration for versioned results with lineage

  pip install spark-llm-eval

  The main gap I'm filling: "I have 2M labeled examples and need to know if Model A is statistically significantly better than Model B." Most frameworks give you point estimates; this gives you confidence intervals and significance tests.

  Blog post with architecture details: https://subhadipmitra.com/blog/2025/building-spark-llm-eval/

  Happy to answer questions about the implementation - rate limiting in distributed contexts was surprisingly tricky.

Beaver: An Efficient Deterministic LLM Verifier

https://arxiv.org/abs/2512.05439
1•tshanmu•27s ago•1 comments

Agent-99 – safe (no, really) eval() in the cloud

https://www.npmjs.com/package/agent-99
1•podperson•35s ago•1 comments

What I do to outrank everyone in my niche on Google and ChatGPT results (free)

https://easyfaq.io
1•branoco•1m ago•1 comments

Colorado's modular builders stand at a crossroads

https://coloradosun.com/2025/12/16/modular-home-building-colorado-container-homes/
1•mooreds•1m ago•0 comments

What Is PKCE and Why Your OAuth Implementation Needs It

https://oneuptime.com/blog/post/2025-12-16-what-is-pkce-and-why-you-need-it/view
1•ndhandala•2m ago•0 comments

'Heat' at 30: Michael Mann's Meticulous Masterpiece of Both Style and Substance

https://cinephiliabeyond.org/heat/
1•canarymark•2m ago•0 comments

Why Postgres and ClickHouse are becoming the Open Source Data Stack for AI

https://thenewstack.io/postgres-clickhouse-the-oss-stack-to-handle-agentic-ai-scale/
1•saisrirampur•2m ago•0 comments

Show HN: Veriduct Prime – Format destruction framework for binary evasion

https://github.com/Bombadil-Systems/veriduct-prime
1•float_val•4m ago•0 comments

Free Unix and Linux Shell Servers

https://aruljohn.com/freeshell/
1•mliezun•4m ago•0 comments

Covid Jawboning Lawsuit Dismissed (For Now)–Dressen vs. Flaherty

https://blog.ericgoldman.org/archives/2025/12/covid-jawboning-lawsuit-dismissed-for-now-dressen-v...
1•hn_acker•6m ago•0 comments

Building a Transformer from Scratch Taught Me Where Knowledge Lives

https://medium.com/@kishore-jalleda/transformers-dont-think-in-attention-d911dc447ca3
1•kishore-jalleda•7m ago•0 comments

Sent from my iPad – Steve Jobs (2010)

https://putsomethingback.stevejobsarchive.com/
1•vismit2000•8m ago•0 comments

Local-first analysis of a million genetic traits with LLM-assisted exploration

https://blog.monadicdna.com/monadic_dna_explorer_premium/
1•vishakh82•9m ago•1 comments

Guideless: Create Software Video Guides in Minutes

https://guideless.ai/
1•bellamoon544•9m ago•1 comments

The four creative trends that will define marketing in 2026

https://blog.adobe.com/en/publish/2025/12/09/four-creative-trends-define-marketing-2026
1•eustoria•9m ago•0 comments

Re-run failed translations 10x faster with the latest Gato AI Translations (WP)

https://gatoplugins.com/blog/re-run-failed-translations-10x-faster-with-v15-3-of-gato-ai-translat...
1•leoloso•10m ago•0 comments

Show HN: See the carbon impact of your cloud as you code

6•hkh•10m ago•0 comments

Four Million U.S. Children Had No Health Insurance in 2024

https://www.scientificamerican.com/article/how-rising-rates-of-uninsured-children-will-increase-p...
2•Brajeshwar•10m ago•0 comments

Better Than the Cheap Alternative

https://seths.blog/2025/12/better-than-the-cheap-alternative/
1•herbertl•10m ago•0 comments

Fork

https://mough.xyz/2025/12/fork/
1•eustoria•11m ago•0 comments

CC Signals: What We've Been Working On

https://creativecommons.org/2025/12/15/cc-signals-what-weve-been-working-on/
1•Tomte•11m ago•0 comments

iOS-Backup-Machine

https://github.com/giovi321/ios-backup-machine
1•dylan604•11m ago•0 comments

The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs

https://arxiv.org/abs/2506.18403
2•chengchang316•12m ago•2 comments

Ruby Red Women's Clothing: How to Wear This Power Color Without Looking Overdone

https://fashionablyfifty.substack.com/p/ruby-red-womens-clothing-how-to-wear
1•MaxwellJ•12m ago•0 comments

Living cells may generate electricity from motion

https://www.sciencedaily.com/releases/2025/12/251216081930.htm
1•rkrzr•13m ago•1 comments

Hardware Powers of 10

https://buttondown.com/justincormack/archive/ignore-previous-directions-9-hardware-power/
1•walterbell•13m ago•0 comments

Recommended Posts, 2017-2025

https://www.raphkoster.com/2025/11/26/recommended-posts-2017-2025/
1•surprisetalk•15m ago•0 comments

Couple raises family in self-sufficient homestead inside a greenhouse [video]

https://www.youtube.com/watch?v=p5ILdwn0_Fk
1•surprisetalk•15m ago•0 comments

John von Neumann Shot Lightning from His Arse

https://www.theintrinsicperspective.com/p/john-von-neumann-shot-lightning-from/comment/176614979
1•surprisetalk•16m ago•0 comments

Homemade Juggling Beanbag Guide

https://www.joshuaclifton.com/juggle/
1•surprisetalk•16m ago•0 comments