frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Show HN: I made Google Trends for Hacker News by indexing 18 years of comments

https://hackernewstrends.com
96•ytkimirti•1h ago•24 comments

You can't unit test for taste

https://dev.karltryggvason.com/you-cant-unit-test-for-taste/
128•kalli•1d ago•49 comments

Zig's New BitCast Semantics and LLVM Back End Improvements

https://ziglang.org/devlog/2026/#2026-06-25
27•kouosi•1h ago•3 comments

Half-Life 2 in a Browser

https://hl2.slqnt.dev/
484•panza•9h ago•199 comments

Show HN: Turn native language audio into flashcards and shadowing practice

https://lingochunk.com/try
32•alder•3h ago•15 comments

Anthropic says Alibaba illicitly extracted Claude AI model capabilities

https://www.reuters.com/world/china/anthropic-says-alibaba-illicitly-extracted-claude-ai-model-ca...
632•htrp•19h ago•1025 comments

Ford rehires 350 engineers after AI fails to preserve expertise or train juniors

https://www.bloomberg.com/news/articles/2026-06-25/ford-has-been-rehiring-quality-inspectors-afte...
26•alanwreath•24m ago•7 comments

LastPass notifies users of yet another data breach

https://9to5mac.com/2026/06/23/lastpass-notifies-users-of-yet-another-data-breach/
262•mooreds•4h ago•118 comments

Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions?

https://infoscience.epfl.ch/entities/publication/9a49779b-f9f8-448d-b3d1-737c78455309
31•rbanffy•1d ago•5 comments

Ask HN: What surprised you about Estonia e-Residency and running an Estonian OÜ?

27•jvilalta•1h ago•11 comments

OpenAI unveils its first custom chip, built by Broadcom

https://techcrunch.com/2026/06/24/openai-unveils-its-first-custom-chip-built-by-broadcom/
769•jamdesk•21h ago•440 comments

Wikipedia Workers in Britain set global first by seeking union recognition

https://utaw.tech/news/wikipedia-recognition
170•chobeat•8h ago•159 comments

Cloudflare launched self-managed OAuth for all

https://blog.cloudflare.com/oauth-for-all/
275•terryds•13h ago•116 comments

Blogging can just be stating the obvious

https://blog.jim-nielsen.com/2026/blogging-stating-the-obvious/
359•Curiositry•15h ago•110 comments

Bohemia Interactive: Cold War Assault Remastered Source Code on GitHub

https://github.com/BohemiaInteractive/CWR
159•dewey•2d ago•33 comments

LuaJIT 3.0 proposed syntax extensions

https://github.com/LuaJIT/LuaJIT/issues/1475
201•phreddypharkus•14h ago•116 comments

Lianda and the Long March

https://blog.georeactor.com/books-06-26b
4•mapmeld•1d ago•0 comments

45°C cooling design cuts data center water use to near zero

https://blogs.nvidia.com/blog/liquid-cooling-ai-factories/
425•nitin_flanker•1d ago•345 comments

Medical students are using popular research tool to pump out misleading studies

https://www.science.org/content/article/medical-students-are-using-popular-research-tool-pump-out...
114•rndsignals•12h ago•65 comments

The Disappearance of Japan's Animators

https://economist.com/interactive/1843/2026/06/19/the-strange-disappearance-of-japans-animators
48•andsoitis•3d ago•43 comments

GLM-5.2 is a step change for open agents

https://www.interconnects.ai/p/glm-52-is-the-step-change-for-open
314•vantareed•2d ago•182 comments

Federal agents track down woman, demand she remove Instagram post about ICE

https://www.syracuse.com/news/2026/06/federal-agents-track-down-syracuse-woman-demand-she-remove-...
76•coloneltcb•1h ago•34 comments

SoftBank 2026 AGM [pdf]

https://group.softbank/media/Project/sbg/sbg/pdf/ir/investors/shareholders/2026/shareholders-meet...
9•dmmalam•2h ago•1 comments

Show HN: StartupsBR – A map of Brazilian startups

https://www.startupsbr.com/sao-paulo
46•leonagano•5d ago•21 comments

Dostoyevsky isn't difficult

https://www.autodidacts.io/dostoyevsky-isnt-difficult/
202•surprisetalk•3d ago•247 comments

Lies, Damn Lies and Database Benchmarks

https://questdb.com/blog/lies-damn-lies-and-database-benchmarks/
43•eigenBasis•2d ago•15 comments

Show HN: Secs-man, a secrets manager you can (not) rely on

https://github.com/Fran314/secrets-manager-rs
14•Fran314•2h ago•9 comments

RubyLLM: A Ruby framework for all major AI providers

https://rubyllm.com/
422•doener•1d ago•72 comments

Words, Words, Words

https://aeon.co/essays/literature-fans-should-welcome-ai-as-a-fellow-wordsmith
26•benbreen•2d ago•10 comments

Countries are competing to see which can carry out mass surveillance the best

https://mullvad.net/en/why-privacy-matters/state-mass-surveillance
226•Cider9986•2h ago•86 comments
Open in hackernews

You can't unit test for taste

https://dev.karltryggvason.com/you-cant-unit-test-for-taste/
128•kalli•1d ago

Comments

throw93949444•2h ago
> For example, my native Iceland had a nice mix of nature, historical sites and populated places.

You absolutely can unit test for taste, just put an agent into loop, and write into prompt what you like. Then do scoring...

Iceland is really bad example, it basically has one populated site (capital) and circular road that goes around the island.

voidUpdate•1h ago
I'm pretty sure there's more points of interest in the entirety of Iceland than just Reykjavík and Route Number One
chantepierre•2h ago
It makes me smile when runners use "X is a marathon, not a sprint" to hint at an effort that accumulates over time and an optimal use of energy.

I do it too because it's a common expression, and a marathon is of course longer than a sprint, but both have in common that properly raced, they are absolutely brutal efforts that leave you without a single additional drop at the end. The effort length and instantaneous power output changes, of course. Maybe "it's a marathon build, not the race" would be more precise at the loss of nearly all its expressive power (but with a lot more pedanticism points) :-p .

Nice project !

another-dave•2h ago
"The effort length and instantaneous power output changes, of course."

but that's what the phrase is meant to convey, right?

Don't run through consumable X (energy/money/etc) like there's no tomorrow - even though there's <some big important milestone> now, we've got dozens more of those that we need to meet, so you're better off getting this one done at 75% than committing 100% to it and failing on all the others.

boredumb•1h ago
Don't work 12 hour days to get milestone X out, because there are dozens more milestones so don't get burnt on trying to get this one out yesterday. It would probably be more like, don't use 200% to get this out and then quit or burn yourself to 0% or a few % in a year when we want you to extend and maintain this stuff.
chantepierre•1h ago
Yeah you're right, I hear it more like "this is a week long hike, not a sprint" as if a marathon included rest. In any length of racing there's no tomorrow. But I'm doing tongue-in-cheek pedanticness here and will stop that right now !
dasil003•1h ago
I'd wager that if a manager says that they want you to take it more like a real marathon and less like long hike.
jayd16•55m ago
In a marathon, not sprinting is the rest.
trjordan•2h ago
You can't unit test for taste if you haven't written down what you mean by taste. If you can externalize it, then you can.

Follow this line of thinking, and the AI-friendly answer is easy: we just have to externalize everything we know, so Claude can implement what I want.

Except that I can't fully externalize myself. Debugging a system takes more resources than running the system. If I could write down everything I know and hand it to a machine, I'd do that, but it impossible.

People aren't books or hashmaps. If you want to build something, you need to use the tools, not teach the tools to use you.

[edit: I'm trying to figure out if there's something to be done about this. Email me if you want to chat -- tr at tern dot sh]

bonzini•2h ago
It can't be written down as code, that's the point.

I am more familiar with taste in coding and it can at best be described—that the resulting code is too subtly different from something else in the codebase, that you're masking a different bug, that you're not following what the code tells you. The good part is that while this cannot be unit tested, you can write documentation and code comments about it that tell people what they need to know.

But for taste of the kind described in the article there's not even a definition. The logic ended up being "trust a bunch of opaque weights the most"

Chris2048•1h ago
Technically, AI is code, just very complex code.

I'd say there are "simple" simple things you can do though, like take automated screenshots and detect colours for jarring colourschemes.

delichon•1h ago
You may be able to effectively externalize taste by "hot or not" style pair testing. Enough comparisons and I'd expect ML to be able to mimic human taste by latching on to features we're not well aware of influencing us.
a_c•1h ago
I like to think of testing as making sure things not wrong, but not making it right.

Working, useful, delightful, in that order. Testing can make things more likely to work, that's it.

TimXare•1h ago
Taste is mostly the part of the spec you forgot to write down, plus the part you couldn't write down even if you tried.
esafak•1h ago
We can encode taste -- generative AI depends on it. Ask people to compare two examples and pick the one with better taste. You can even ask them to rate multiple subjective criteria at once. Use that to learn a scoring function based on the rating labels, and raw features. Now you can write tests.
Gosper•1h ago
Language count is a decent notoriety signal though pretty coarse. The OP/author should take a look at QRank: https://qrank.toolforge.org/

> QRank is a ranking signal for Wikidata entities. It gets computed by aggregating page view statistics for Wikipedia, Wikitravel, Wikibooks, Wikispecies and other Wikimedia projects

from https://github.com/brawer/wikidata-qrank/blob/main/doc/desig...

hei-lima•9m ago
Cool! Thanks for sharing.
carra•1h ago
So now we need a framework for unit tastes
timroman•1h ago
https://pureinference.com/insights/taste-is-the-new-skill

I wrote about this a few months back. Rick Rubin is famous for this. I do think it is something that can be trained though, it just needs a lot more context. Taste builds over time through lots of unit tests, through lots of content writing, through an accumulation of product decisions. It’s hard to put it in the individual spec, but it can be teased out of 100 project specs. And when you get to that scale the AI starts to do it pretty well.

sesm•1h ago
> Rick Rubin told Anderson Cooper he has no technical ability. Doesn't play instruments. Can't work a mixing board.

If you watch his interview on Rick Beato's channel, this myth will fall apart. He plays guitar, had his own punk rock band and his guitar playing is featured on some high-profile records he produced. Also, he has a lot of practical experience with all kinds of studio equipment.

timroman•1h ago
That’s exactly it. His taste isn’t in any one thing. It’s the esoteric and accumulated from a variety of things. You can’t package it up. That’s the point on the project specs. I can never get it right in one, but the arc over 100 becomes visible. Especially to an LLM that has the capacity to intake and understand that.
themgt•56m ago
This is exactly it - the ultimate skill now is to be Rick Rubin with an LLM. Not a comfortable transition as a coder.
pjmlp•1h ago
Exactly one of the reasons I never went down with all the TDD dogma of only writing code to fix broken tests.

There is a reason conference talks are always about plain algorithms and data structures.

bob1029•48m ago
The biggest flaw I've seen with TDD is the fact that correctness does not compose upward. Every time two units come into contact, you've got an entirely new kind of unit. The tests from constituents do not cover emergent properties of the new things. You will repeat this same exercise the entire way up to the top, and the moment you come into contact with the customer (they want to change everything), the house of cards comes crumbling down and you have to start your agonizingly-slow process all over from the bottom again.

The only thing that the business seems to care about is top-down UI testing. This is also convenient because you can leave it until the very end after the customer has already seen several prototypes.

I do think TDD makes sense in isolated scopes (prove this specific custom parser works at the edges), but as the general policy for the entire product it's definitely not a viable practice. Much of the time if comes off as an ego trip to see just how cleverly we can mock something so that we can say we technically tested it.

pjmlp•41m ago
Exactly, the whole system thinking and large scale architecture also fails apart, when writing everything from little working tests.
jpadkins•55m ago
I think another important question is can you distill taste? (another comment uses the phrase "externalize", which might mean something similar).

I think people have been trying for the written word, with some degree of success (anti-slop skills). I have been trying for visuals, and it's pretty meh. It's easy to get a multimodal LLM to follow a style guide, but a style guide doesn't capture everything that accounts for taste. And anything that is dynamic (not a screenshot test) seems really hard or really expensive.

ChrisMarshallNY•52m ago
> but it ended up merely in a supporting role

This has been my experience, as well, but it’s a really big support. It just needs adult supervision. I can’t understand how vibe-coded apps, actually work.

As far as “taste,” goes, I test my stuff constantly, checking for even minor “friction points,” sometimes, refactoring back to design, in order to resolve issues that many folks would ship. I’m pretty anal, and want my work to be the best experience possible.

I can’t see any LLM coming close to being able to evaluate the user experience, like I can.

paytonjjones•48m ago
Tools like Playwright and Maestro can already give you a small taste of what that would look like.

But overall I agree, LLMs are currently awful at being beta testers. They miss the most basic stuff that any human would immediately catch as being poor UX, and for all their visual prowess they are terrible at auditing UI.

tuo-lei•35m ago
the taste part for me is cutting what the agent generated. 200 lines come back, i keep 80, no test for which 80.
thomasfl•34m ago
That's what linters are for. Linters can prevent SQL code from spilling out to code outside the model layer. Even more important when vibecoding.
fotoblur•25m ago
No but you can add selection as part of your workflow. Governance is something AI agents have allowed me to focus on more and more and this IMHO is where taste lands for me: https://github.com/lramoth/infoPipeline/blob/main/governance...
trjordan•1h ago
This is RL, right? Like, this is exactly why models have mostly converged around obvious style, because we train them literally on thumbs-up/thumbs-down data of what good behavior and good code looks like.

And that's why it's so hard to get a model to reproduce the specific taste of a person or an organization. My taste is different than yours, so if we dump our aggregate preferences into RL, in averages out to nothing interesting.

For the code-writing case, this means you end up reviewing every line of code, looking for places where you'd thumbs-down the code. Not every line of code contains a real decision, though, so it feels like a waste of time.

paytonjjones•1h ago
This is, in short, the big current problem with AI.

LLMs are built for scale so they've given up on the kind of online learning / "long term memory" processes that would individualize them.

The LLM is permanently locked to being a really cracked engineer on their first day at your company, looking at your codebase for the first time.

You can scaffold a bit with .md files, but at the moment they lack the ability to do what humans do: go to sleep, encode things from short to long term memory, and wake up the next day with more specific knowledge baked in.

trjordan•1h ago
100%. The problem with them isn't making sure they're doing the right thing, it's making sure they're not making bad assumptions.

IMHO this is where code review goes until we fix the individualized model thing: you need to review the decisions the agent made, where you didn't steer. Most will be right. A few will be disastrously wrong. But decision-by-decision is a lot less to review than line-by-line of code.

plastic-enjoyer•1h ago
> LLMs are built for scale so they've given up on the kind of online learning / "long term memory" processes that would individualize them.

I wonder if this is even desirable from a product perspective. You probably don't want online learning in a product that you are selling because you can't guarantee a consistent quality of the product.

paytonjjones•44m ago
You could say the same thing about employees!

And to be fair, the ability to fire employees and hire new ones is pretty important for that reason. In cases where you can't easily fire employees (e.g. unions), you encounter the very problem you're describing, and it often leads to companies preferring more consistent automations.

eithed•10m ago
Yes and no.

If I were to ask you - what convention you want to follow for your database columns - camelcase or snakecase? There's no correct global answer. There's no overarching truth that should apply to all databases in existence (even if you'll focus on a certain type of database). Hence the no.

But yes, because in the context of existing system there is a convention. If it's snakecase, you create new tables with snakecase column names.

LLMs will generally follow conventions, but sometimes they will not, because indeed - global truths sometimes win over (I assume)

al_borland•25m ago
Wouldn't this style of training suffer from the AI learning things the user didn't intend? I may thumbs down something for a specific detail I don't like, while other things in it are great. Certain traits that tend to occur together go along for the ride. We see similar things happen in natural selection, where mates may be chosen for 1 specific feature, and other less desirable things come along for the ride.

Outside of AI, I run into this issue when taking basic personality tests. A question may be written for a specific reason, which influences the results, but the reason for my answer may be completely unrelated to the reason intended by the person who made the test.

paytonjjones•6m ago
This can usually be solved by scale alone (in all three contexts: RL, evolution, and IRT / psychometric testing)

The co-occurence thing is often not a bug of the algorithm but a genuine part of the stochastic landscape that must be solved. Evolution isn't "failing" when sickle cell vulnerability is ported along with malaria resistance; it's just a real tradeoff being made in the current biological landscape.

sigbottle•1h ago
Exactly. Every single philosophical statement in history runs up against the issue where you can just say, "yeah, it's pretty much this. You just need to do <arbitrarily hard unspecified thing that is basically unfalsifiability>". (Including this one)

And maybe that's just our limits with philosophy, modeling, assumptions, whatever. The danger is not realizing when we're in that zone.

(Fwiw I think unfalsifiability is a limit with any system - "you didn't compile in my syntax/semantics" is an gotcha that's actually valid and useful, but nobody can really determine the hard line)

giancarlostoro•1h ago
What's kind of funny is this is how I implemented "gates" for the ticketing system I built for Claude, because Beads would just close tickets without validation. I have tickets that are literally "Human validation" tier, so it will work on the next available thing until I personally tell the model to close it. So, in that spirit, yeah, you can unit test for taste, if you implement external validation.

Unit test runs, waits for human input before passing or failing, which might seem out of the norm, but we already have QA do manual testing.

tmoertel•53m ago
> You can't unit test for taste if you haven't written down what you mean by taste. If you can externalize it, then you can.

I'm not so sure. For instance, you can write down what it means for a program to be free of XSS and other injection vulnerabilities. Now, how would you unit test for that property?

pydry•51m ago
I remember reading an interview with a fireman who described a time when his buddy evacuated a team because he "felt" that a floor would collapse imminently.

He couldn't articulate why but they trusted his gut and it did collapse.

A lot of software engineering relies on that kind of intuition and on a good team you can integrate it and benefit from it and avoid all manner of floor collapses.

dyarosla•26m ago
To play devil’s advocate, intuition is still a physical response to stimuli mixed with knowledge of past experience. Hypothetically it could be modeled- the problem here comes down to how to encode it.
sigbottle•9m ago
"Encoding" implies some GOFAI symbolic formal rule machinery.

I'd argue that transformers are a pretty good indication that intelligence isn't "encodable" in the way we think it means. Usually, most "model" vocabulary means that we can explain and constrain the "data" from the "rules". Except the mere "data" is trillions of interacting weights.

That may be encoding in a physical sense, but that still doesn't explain the intuition in any legible way to humans.

Cynically, we've been able to encode everything already by just saying everything's a transition in a huge lookup table. Not very informative though.

punnerud•42m ago
If you have enough examples you can train an AI on your preferences, then use that distilled AI as a unit test. Don’t combine multiple into one AI. If they don’t agree you want it to fail so you can decide and retrain the tests.
Dumblydorr•39m ago
Randomized trial. Half of them pledge to use AI freely and liberally, half of them to never use it, compare via surveys and off-AI tests after X months. Could even flip it so then the non-users used it for X months and vice versa, see if losses/gains are stable.
eithed•19m ago
I agree and indeed externalize everything you know *that matters*.

Want to follow certain pattern, or convention - define it, ie active record vs repository pattern, stick is as an ADR! You don't know what you want? Look at what Claude produces and then acquire taste, mark this as convetion that future sessions will follow, but stick to *one* convention!

deadbabe•15m ago
You cannot externalize taste. You could perhaps mimic someone’s taste, but that’s not the taste. Knowing the taste requires actually tasting it. You can’t capture the taste, it’s already gone.