frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Local RAG Eval Harness – reproducible benchmarksfor retrieval pipelines

1•myroslavmokhamm•2h ago
Demo: REPLACE_WITH_DEMO_URL (no login) Repo: REPLACE_WITH_GITHUB_REPO_URL (profile: https://github.com/myroslav-abdeljawwad)

What it is A small toolkit to run reproducible, local evaluations for retrieval-augmented generation (RAG). It ships with a CLI + notebooks, fixed seeds, and a baseline config so results are easy to compare across machines.

Why Most RAG repos don’t show repeatable benchmarks. This tries to make “evals-first” the default.

Features - Metrics: Hit@K, MRR, Exact Match, grounded accuracy, latency, token-cost - Seeds + config-locked runs (YAML) - Plug-in chunkers (by structure/semantics), retrievers, and rerankers - One-command local runs (Docker optional) - Minimal HTML report + CSV/Parquet exports

Try it with public data - Harvard Dataverse (DVN/6TI8KI): https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi...

Stack Python, Typer CLI, pytest, SQLite/Postgres, Docker (optional)

What I’d love feedback on 1) Which metrics you actually trust in day-to-day work 2) Chunking heuristics that generalize across domains 3) Reranker swaps that improve grounded accuracy without killing latency

Roadmap - Dataset adapter registry - Built-in failure-mode explorer - Tiny web UI for run diffs

License MIT

About me — Myroslav Mokhammad Abdeljawwad (GitHub: https://github.com/myroslav-abdeljawwad)

Walter Murch: The Languages of Film Editing and Sound Design

https://www.lrb.co.uk/the-paper/v47/n19/john-lahr/every-blink
1•mitchbob•8m ago•1 comments

Show HN: Run SQLite Directly on S3 from AWS or GCP

https://docs.archil.com/guides/databases/sqlite
2•huntaub•9m ago•0 comments

He Lost His Mind Using ChatGPT. Then It Told Him to Contact Me [video]

https://www.youtube.com/watch?v=zkGk_A4noxI
2•dp-hackernews•11m ago•0 comments

Htmx: Access modern browser features directly from HTML

https://htmx.org/
3•vemy•11m ago•0 comments

Humans peak in midlife: A combined cognitive and personality trait perspective

https://www.sciencedirect.com/science/article/pii/S0160289625000649
2•domofutu•14m ago•0 comments

The Chemicals Behind the Colours of Autumn Leaves (2014)

https://www.compoundchem.com/2014/09/11/autumnleaves/
1•NaOH•14m ago•0 comments

Most Job Seekers Skip Negotiation and Pay a High Price

https://anderson-review.ucla.edu/most-job-seekers-skip-negotiation-and-pay-a-high-price/
2•zuhayeer•15m ago•0 comments

Resistance Training Reshapes the Gut Microbiome for Better Health

https://www.biorxiv.org/content/10.1101/2025.08.13.670057v1
1•domofutu•16m ago•0 comments

What Japan Taught Me About American Trains

https://www.persuasion.community/p/why-american-trains-suck
2•jger15•18m ago•0 comments

Silver Snoopy Award

https://www.nasa.gov/space-flight-awareness/silver-snoopy-award/
1•LorenDB•19m ago•0 comments

How to save Madagascar's dwindling forests

https://www.economist.com/interactive/science-and-technology/2025/10/15/how-to-save-madagascars-d...
1•alphabetatango•21m ago•0 comments

Signal Bot

https://blog.aaronjenkins.net/Code/Signal%20Bot
1•m-hodges•21m ago•0 comments

Show HN: TrueState – AI chatbot to analyse data with natural language

https://www.truestate.io/
1•emobill•23m ago•0 comments

The AI Industry's Scaling Obsession Is Headed for a Cliff

https://www.wired.com/story/the-ai-industrys-scaling-obsession-is-headed-for-a-cliff/
2•danaris•26m ago•1 comments

Go Subtleties You May Not Know

https://harrisoncramer.me/15-go-sublteties-you-may-not-already-know/
2•rbanffy•31m ago•0 comments

China Has Overtaken America – Paul Krugman

https://paulkrugman.substack.com/p/china-has-overtaken-america
21•rbanffy•34m ago•2 comments

US out of 10 most powerful passports list for first time in 20 years

https://www.theguardian.com/us-news/2025/oct/15/most-powerful-passports-world-list
3•mitchbob•34m ago•1 comments

Apple and Google Warn: Texas Age Verification Law Destroys Privacy [video]

https://www.youtube.com/watch?v=jP-kqEHirTM
2•technojunkie•35m ago•3 comments

Cheap DIY solar fence design

https://joeyh.name/blog/entry/cheap_DIY_solar_fence_design/
2•kamaraju•37m ago•2 comments

Automating HTB Exploits: LLM-Driven N8n Agent's Hacking Ability

https://luciuswayne.com/blog/automating-hack-the-box-with-llm-n8n-agent/
1•Vandolin•38m ago•1 comments

A Simple Way to Know When the Economy's About to Turn

https://writings.alethia.news/the-biggest-piece-of-the-recession-puzzle/
1•truelson•38m ago•1 comments

Dynamic Levels of Detail in Evolve

https://www.evolvebenchmark.com/blog-posts/dynamic-levels-of-detail-in-evolve
1•ibobev•38m ago•0 comments

Real-Time Rendering with JPEG-Compressed Textures

https://github.com/elias1518693/jpeg_textures
2•ibobev•39m ago•0 comments

My First Months in Cyberspace

https://www.gyford.com/phil/writing/2025/10/15/1995-internet/
1•edent•39m ago•0 comments

Recommended resources for growing in game development

https://owlcat.games/learninga
2•ibobev•39m ago•0 comments

Amazon is planning a new wave of layoffs, sources say

https://fortune.com/2025/10/14/amazon-layoffs-pxt-hr-andy-jassy/
7•rainhacker•40m ago•0 comments

Bits-per-Byte (BPB): a tokenizer-agnostic way to measure LLMs

https://dipkumar.dev/posts/llm/bits-per-byte/
1•immortal3•42m ago•0 comments

Amp Free, Agentic coding is now free for everyone.

https://ampcode.com/free
3•nwjsmith•43m ago•0 comments

Chinese National Who Deployed KillSwitch Code on Empl Network Sentenced to 4 Yrs

https://www.justice.gov/opa/pr/chinese-national-who-deployed-kill-switch-code-employers-network-s...
1•737min•46m ago•0 comments

Grow your Reddit authority with helpful replies!

https://reddinbox.com
1•eletopp•52m ago•0 comments