frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

KV Cache Transform Coding for Compact Storage in LLM Inference

https://arxiv.org/abs/2511.01815
1•walterbell•42s ago•0 comments

A quantitative, multimodal wearable bioelectronic device for stress assessment

https://www.nature.com/articles/s41467-025-67747-9
1•PaulHoule•2m ago•0 comments

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

https://www.wsj.com/world/india/why-big-tech-is-throwing-cash-into-india-in-quest-for-ai-supremac...
1•saikatsg•2m ago•0 comments

How to shoot yourself in the foot – 2026 edition

https://github.com/aweussom/HowToShootYourselfInTheFoot
1•aweussom•3m ago•0 comments

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents
2•archb•4m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...
1•walterbell•5m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/
1•danver0•6m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/
1•bumahkib7•6m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag
1•artigent•11m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor
3•dragandj•13m ago•0 comments

Tmux to Zellij (and Back)

https://www.mauriciopoppe.com/notes/tmux-to-zellij/
1•maurizzzio•13m ago•1 comments

Ask HN: How are you using specialized agents to accelerate your work?

1•otterley•15m ago•0 comments

Passing user_id through 6 services? OTel Baggage fixes this

https://signoz.io/blog/otel-baggage/
1•pranay01•15m ago•0 comments

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

https://davmail.sourceforge.net/
1•todsacerdoti•16m ago•0 comments

Visual data modelling in the browser (open source)

https://github.com/sqlmodel/sqlmodel
1•Sean766•18m ago•0 comments

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

https://github.com/chinonsochikelue/tharos
1•fluantix•19m ago•0 comments

Oddly Simple GUI Programs

https://simonsafar.com/2024/win32_lights/
1•MaximilianEmel•19m ago•0 comments

The New Playbook for Leaders [pdf]

https://www.ibli.com/IBLI%20OnePagers%20The%20Plays%20Summarized.pdf
1•mooreds•19m ago•0 comments

Interactive Unboxing of J Dilla's Donuts

https://donuts20.vercel.app
1•sngahane•21m ago•0 comments

OneCourt helps blind and low-vision fans to track Super Bowl live

https://www.dezeen.com/2026/02/06/onecourt-tactile-device-super-bowl-blind-low-vision-fans/
1•gaws•23m ago•0 comments

Rudolf Vrba

https://en.wikipedia.org/wiki/Rudolf_Vrba
1•mooreds•23m ago•0 comments

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

https://www.medpagetoday.com/neurology/autism/119747
1•paulpauper•24m ago•0 comments

Wellness Hotels Discovery Application

https://aurio.place/
1•cherrylinedev•25m ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...
1•mooreds•25m ago•0 comments

Sebastian Galiani on the Marginal Revolution

https://marginalrevolution.com/marginalrevolution/2026/02/sebastian-galiani-on-the-marginal-revol...
2•paulpauper•29m ago•0 comments

Ask HN: Are we at the point where software can improve itself?

1•ManuelKiessling•29m ago•2 comments

Binance Gives Trump Family's Crypto Firm a Leg Up

https://www.nytimes.com/2026/02/07/business/binance-trump-crypto.html
1•paulpauper•29m ago•1 comments

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

https://old.reddit.com/r/ClaudeCode/comments/1qy5l0n/reverse_engineering_chinese_shitprogram_for/
1•edward•29m ago•0 comments

Indian Culture

https://indianculture.gov.in/
1•saikatsg•32m ago•0 comments

Show HN: Maravel-Framework 10.61 prevents circular dependency

https://marius-ciclistu.medium.com/maravel-framework-10-61-0-prevents-circular-dependency-cdb5d25...
1•marius-ciclistu•32m ago•0 comments
Open in hackernews

Thoughts on Evals

https://www.raindrop.ai/blog/thoughts-on-evals/
30•Nischalj10•2mo ago

Comments

CharlieDigital•2mo ago
Both of these are kind of silly and vendors trying to sell you tooling you probably don't need.

In a gold rush, each is trying to sell you a different kind of shovel claiming theirs to be the best when you really should go find a geologist and and figure out where the vein is.

anonymoushn•2mo ago
The framing in this post is really weird. Automated evals can be much more informative than unit tests because the results can be much more fine grained. A/B testing in production is not suitable for determining whether all of one's internal experiments are successful or not.

I don't doubt that Raindrop's product is worthwhile to model vendors, but the post seems like its audience is C suite folks who have no clue how anything works. Do their most important customers even have any of these?

CharlieDigital•2mo ago
I think in most cases, outside of pure AI providers or think AI wrappers, almost every team will realize more gains from focusing on their user domains and solving business problems versus fine tuning their prompts to eek out a 5% improvement here and there.
basket_horse•2mo ago
I don’t think you can use this as a blanket statement. For many use cases the last 5-10% is the difference between demoware and production.
CharlieDigital•2mo ago
If that were true, just switching to TOON would make your startup take off.

That is obviously not true because a 5% gain in LLM performance isn't going to make up for a bad product.

gk1•2mo ago
Founder of a/b testing company accuses founder of evals company of misrepresenting how a/b tests are used in practice, then concludes by misrepresenting how evals are used in practice: "Or you can write 10,000,000 evals."

Could've easily been framed as "you need both evals and a/b testing," but instead they chose this route which comes across as defensive, disingenuous, and desperate.

BTW, if a competitor ever writes a whole post to refute something you barely alluded to without even mentioning their name... congratulations, you've won.

basket_horse•2mo ago
Agree. This whole post comes across as sales rather than the truth that both are useful for different things
eitland•2mo ago
This scared me until I realized it is about raindrop.ai, not raindrop.io.

(Raindrop.io is a bookmark service that AFAIK has "take money from people and stores their bookmarks" as its complete business model.)

koakuma-chan•2mo ago
> Intentionally or not, the word "eval" has become increasingly vague. I've seen at least 6 distinct definitions of evals

This. I am so tired of people saying evals without defining what they mean. And now even management is asking me for evals and why we are not fine tuning.

esafak•2mo ago
No, thanks. Just use evals with error bars. If you can't get error bars, use an A/B test to detect spuriousness and evals.
gregsadetsky•2mo ago
I'm new/uninformed in this world, but I have an idea for an eval that I think has not been tried yet.

Can anyone direct me towards how to ... make one? At the most fundamental level, is it about having test questions with known, golden (verified, valid) answers, and asking different LLM models to find the answer, and comparing scores (how many were found to be correct)?

What are "obvious" things that are important to get right - temperature set to 0? At least ~10 or 20 attempts at the same problem for each llm? What are non-obvious gotchas?

Finally, any known/commonly used frameworks to do this, or any tooling that can call different LLMs would be enough?

Thanks!

koakuma-chan•2mo ago
> Can anyone direct me towards how to ... make one?

https://hamel.dev/blog/posts/evals/

> What are "obvious" things that are important to get right - temperature set to 0? At least ~10 or 20 attempts at the same problem for each llm?

LLMs are actually pretty deterministic, so there is no need to do more than one attempt with the exact same data.

> Finally, any known/commonly used frameworks to do this, or any tooling that can call different LLMs would be enough?

https://github.com/vercel/ai

https://github.com/mattpocock/evalite

gregsadetsky•2mo ago
I'm very grateful! Thanks a lot
ncgl•2mo ago
"LLMs are actually pretty deterministic, so there is no need to do more than one attempt with the exact same data."

Is this true? I remember there being a randomization factor in weighing tokens to make the output more something, dont recall what

Obviously I'm not an Ai dev

koakuma-chan•2mo ago
In my experience, the response may not be exactly the same, but the difference is negligible.
moltar•2mo ago
Take a look at promptfoo