frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

We built another object storage

https://fractalbits.com/blog/why-we-built-another-object-storage/
60•fractalbits•2h ago•9 comments

Java FFM zero-copy transport using io_uring

https://www.mvp.express/
25•mands•5d ago•6 comments

How exchanges turn order books into distributed logs

https://quant.engineering/exchange-order-book-distributed-logs.html
48•rundef•5d ago•17 comments

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

https://developer.apple.com/documentation/macos-release-notes/macos-26_2-release-notes#RDMA-over-...
467•guiand•18h ago•237 comments

AI is bringing old nuclear plants out of retirement

https://www.wbur.org/hereandnow/2025/12/09/nuclear-power-ai
32•geox•1h ago•24 comments

Sick of smart TVs? Here are your best options

https://arstechnica.com/gadgets/2025/12/the-ars-technica-guide-to-dumb-tvs/
433•fleahunter•1d ago•362 comments

Photographer built a medium-format rangefinder, and so can you

https://petapixel.com/2025/12/06/this-photographer-built-an-awesome-medium-format-rangefinder-and...
78•shinryuu•6d ago•9 comments

Apple has locked my Apple ID, and I have no recourse. A plea for help

https://hey.paris/posts/appleid/
865•parisidau•10h ago•445 comments

GNU Unifont

https://unifoundry.com/unifont/index.html
287•remywang•18h ago•68 comments

A 'toaster with a lens': The story behind the first handheld digital camera

https://www.bbc.com/future/article/20251205-how-the-handheld-digital-camera-was-born
42•selvan•5d ago•18 comments

Beautiful Abelian Sandpiles

https://eavan.blog/posts/beautiful-sandpiles.html
83•eavan0•3d ago•16 comments

Rats Play DOOM

https://ratsplaydoom.com/
332•ano-ther•18h ago•123 comments

Show HN: Tiny VM sandbox in C with apps in Rust, C and Zig

https://github.com/ringtailsoftware/uvm32
167•trj•17h ago•11 comments

OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

https://simonwillison.net/2025/Dec/12/openai-skills/
481•simonw•15h ago•271 comments

Computer Animator and Amiga fanatic Dick Van Dyke turns 100

109•ggm•6h ago•23 comments

Formula One Handovers and Handovers From Surgery to Intensive Care (2008) [pdf]

https://gwern.net/doc/technology/2008-sower.pdf
82•bookofjoe•6d ago•33 comments

Show HN: I made a spreadsheet where formulas also update backwards

https://victorpoughon.github.io/bidicalc/
179•fouronnes3•1d ago•85 comments

Will West Coast Jazz Get Some Respect?

https://www.honest-broker.com/p/will-west-coast-jazz-finally-get
9•paulpauper•6d ago•2 comments

Freeing a Xiaomi humidifier from the cloud

https://0l.de/blog/2025/11/xiaomi-humidifier/
126•stv0g•1d ago•51 comments

Obscuring P2P Nodes with Dandelion

https://www.johndcook.com/blog/2025/12/08/dandelion/
57•ColinWright•4d ago•1 comments

Go is portable, until it isn't

https://simpleobservability.com/blog/go-portable-until-isnt
119•khazit•6d ago•101 comments

Ensuring a National Policy Framework for Artificial Intelligence

https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-nati...
169•andsoitis•1d ago•217 comments

Poor Johnny still won't encrypt

https://bfswa.substack.com/p/poor-johnny-still-wont-encrypt
52•zdw•10h ago•64 comments

YouTube's CEO limits his kids' social media use – other tech bosses do the same

https://www.cnbc.com/2025/12/13/youtubes-ceo-is-latest-tech-boss-limiting-his-kids-social-media-u...
83•pseudolus•3h ago•66 comments

Slax: Live Pocket Linux

https://www.slax.org/
41•Ulf950•5d ago•5 comments

50 years of proof assistants

https://lawrencecpaulson.github.io//2025/12/05/History_of_Proof_Assistants.html
107•baruchel•15h ago•16 comments

Gild Just One Lily

https://www.smashingmagazine.com/2025/04/gild-just-one-lily/
29•serialx•5d ago•5 comments

Capsudo: Rethinking sudo with object capabilities

https://ariadne.space/2025/12/12/rethinking-sudo-with-object-capabilities.html
74•fanf2•17h ago•44 comments

Google removes Sci-Hub domains from U.S. search results due to dated court order

https://torrentfreak.com/google-removes-sci-hub-domains-from-u-s-search-results-due-to-dated-cour...
193•t-3•11h ago•34 comments

String theory inspires a brilliant, baffling new math proof

https://www.quantamagazine.org/string-theory-inspires-a-brilliant-baffling-new-math-proof-20251212/
167•ArmageddonIt•22h ago•153 comments
Open in hackernews

Learnings from building AI agents

https://www.cubic.dev/blog/learnings-from-building-ai-agents
172•pomarie•5mo ago

Comments

bumbledraven•5mo ago
What model were they using?
jangletown•5mo ago
"51% fewer false positives", how were you measuring? is this an internal or benchmarking dataset?
N_Lens•5mo ago
Very vague post light on details, and as usual, feels more like a marketing pitch for the website.
flippyhead•5mo ago
I found it useful.
weego•5mo ago
It's recreating the monolith vs micro-service argument by proxy for a new generation to plan conference talks around.
vinnymac•5mo ago
I’ve been testing this for the last few months, and it is now much quieter than before, and even more useful.
kurtis_reed•5mo ago
There was a blog post from another AI code review tool: "How to Make LLMs Shut Up"

https://news.ycombinator.com/item?id=42451968

h1fra•5mo ago
what I saw using 5-6 tools like this:

- PR description is never useful they barely summarize the file changes

- 90% of comments are wrong or irrelevant wether it's because it's missing context, missing tribal knowledge, missing code quality rules or wrongly interpret the code change

- 5-10% of the time it actually spots something

Not entirely sure it's worth the noise

bwfan123•5mo ago
code-reviews are not a good use-case for LLMs. here's why: LLMs shine in usecases when their output is not evaluated on accuracy - for example, recommendations, semantic-search, sample snippets, images of people riding horses etc. code-reviews require accuracy.

What is a useful agent in the context of code-reviews in a large codebase is a semantic search agent which adds a comment containing related issues or PRs from the past for more context to human reviewers. This is a recommendation and is not rated on accuracy.

asdev•5mo ago
the code reviews can't be effective because the LLM does not have the tribal knowledge and product context of the change. it's just reading the code at face value
theonething•5mo ago
Isn't it possible to feed that knowledge and context to it? Have it scan your product website and docs, code documentation, git history, etc?
mosura•5mo ago
Lessons.
chanux•5mo ago
https://nolearnings.com/
flippyhead•5mo ago
This is LITERALLY mind blowing.
criddell•5mo ago
I don't like the word learnings either, but you write for your audience and this article was probably written with the hope that it would be shared on LinkedIn.

Learnings might be the right choice here.

I wouldn't complain if the HN headline mutator were to replace "Learnings" with "lessons".

curiousgal•5mo ago
> Encouraged structured thinking by forcing the AI to justify its findings first, significantly reducing arbitrary conclusions.

Ah yes, because we know very well that the current generation of AI models reasons and draws conclusions based on logic and understanding... This is the true face palm.

nico•5mo ago
Humans work pretty much the same way

Several studies have shown that we first make the decision and then we reason about it to justify it

In that sense, we are not much more rational than an LLM

disgruntledphd2•5mo ago
Humans have a lot more introspection capabilities than any current LLM.
alganet•5mo ago
> Several studies

Please, cite those studies. I want to read them.

elzbardico•5mo ago
The "confidence" field in the structured output was what really baffled me.
nzach•5mo ago
I agree with the sentiment of this post. I my personal experience the usefulness of a LLM positively correlated with your ability to constrain the problem it should solve.

Prompts like 'Update this regex to match this new pattern' generally give better results than 'Fix this routing error in my server'.

Although this pattern seems true empirically, I've never seen any hard data to confirm this property(?). And this post is interesting but seems like a missed opportunity to back this idea with some numbers.

exitb•5mo ago
This seems like really bad news for the „AI will soon replace all software developers” crowd.
singron•5mo ago
I think they skipped over a non-obvious motivating example too fast. On first glance, commenting out your CI test suite would be very bad to sneak into a random PR, and that review note might be justified.

I could imagine the situation might actually be more nuanced (e.g. adding new tests and some of them are commented out), but there isn't enough context to really determine that, and even in that case, it can be worth asking about commented out code in case the author left it that way by accident.

Aren't there plenty of more obvious nitpicks to highlight? A great nitpick example would be one where the model will also ask to reverse the resolution. E.g.

    final var items = List.copyOf(...);
    <-- Consider using an explicit type for the variable.

    final List items = List.copyOf(...);
    <-- Consider using var to avoid redundant type name.
This is clearly aggravating since it will always make review comments.
willsmith72•5mo ago
yep completely agreed, how can that be the best example they chose to use?

If I reviewed that PR, absolutely I'd question why you're commenting that out. There better be a very good reason, or even a link to a ticket with a clear deadline of when it can be cleaned up/reverted

mattas•5mo ago
"After extensive trial-and-error..."

IMO, this is the difference between building deterministic software and non-deterministic software (like an AI agent). It often boils down to randomly making tweaks and evaluating the outcome of those tweaks.

s1mplicissimus•5mo ago
Afaik alchemists had a more reliable method than ... whatever this state of affairs is ^^
snapcaster•5mo ago
You're saying alchemy is better than the scientific method?
AndrewKemendo•5mo ago
Otherwise known as science

1:Observation 2:Hypothesis 3:test 4:GOTO:1

This is every thing ever built ever

What is the problem exactly?

wrs•5mo ago
For one thing, what you learned can stop working when you switch to a new model, or just a newer version of the “same” model.
AndrewKemendo•5mo ago
All that means is that you verified the null hypothesis which should be that it doesn’t work

If you create hypothesis tests that are not written in or specific enough then you’re right you’re not gonna be able to do science

Incidentally 99.9% of people I know have no instinct for how to actually do science or have rigor or focus to actually do it in a way that is usable

neuronic•5mo ago
That's because there is no intelligence or understanding involved. They are just trying to brute force a tool for a different purpose into their use case because marketing can't stop overselling AI.
nico•5mo ago
> 2.3 Specialized Micro-Agents Over Generalized Rules Initially, our instinct was to continuously add more rules into a single large prompt to handle edge cases

This has been my experience as well. However, it seems like the platforms like Cursor/Lovable/v0/et al are doing things differently

For example, this is Lovable’s leaked system prompt, 1550 lines: https://github.com/x1xhlol/system-prompts-and-models-of-ai-t...

Is there a trick to making gigantic system prompts work well?

shenberg•5mo ago
When I read "51% fewer false positives" followed immediately by "Median comments per pull request cut by half" it makes me wonder how many true positives they find. That's maybe unfair as my reference is automated tooling in the security world, where the true-positive/false-positive ratio is so bad that a 50% reduction in false positives is a drop in the bucket
Oras•5mo ago
The problem is that, regardless of how you try to use "micro-agents " as a marketing term, LLMs are instructed to return a result.

They will always try to come up with something.

The example provided was a poor one. The comment from LLM was solid. Why would you comment out a step in the pipeline instead of just deleting it? I would comment the same in a PR.

SparkyMcUnicorn•5mo ago
I've found that giving agents an "opt out" works pretty well.

For structured outputs, making fields optional isn't usually enough. Providing an additional field for it to dump some output, along with a description for how/when it should be used, covers several issues around this problem.

I'm not claiming this would solve the specific issues discussed in the post. Just a potentially helpful tip for others out there.

bjorgen•5mo ago
Do you have an example of this in practice? I'm having a hard understanding this and have a very similar problem of the agent wanting to give a response on optional fields.
SparkyMcUnicorn•5mo ago
A (hopefully) clear and probably oversimplified example:

Query -> Person Lookup -> Result-> Structured Output `{ firstName: "", lastName: "" }`

When result doesn't have relevant information, structured output will basically always output a name, whether it found the correct person or not, because it wants to output something, even if the fields are optional. With this example, prompting can help turn the names into "Unknown", but the prompt usually ends up being excessive and/or time consuming to get correct and fix edge cases for. Weaker models might struggle more on details or relevance with this prompt-only approach.

`{ found: boolean, missingFields: [], missingReason: "", firstName, lastName }`

Including one or more of these additional text output properties has an almost magical affect sometimes, reducing the required prompting and hallucinated/incorrect outputs.

ffsm8•5mo ago
Likely because it's temporary?

It takes less effort to re-enable if it's just commented out and its more visible that there is something funky going on that someone should fix.

But yeah, even if it's temporary, it really should have the rationale for commenting it out added... It takes like 5s and provides important context for reviewers and people looking through the file history in the future.

pancsta•5mo ago
By splitting prompts into smaller chunks you effectively get “bias free” opinions, especially when cross-checked. You can then turn them into local reasoning, which is different from “sending an email to the LLM” which seems to be the case here. Remember, LLM is Rainman.
elzbardico•5mo ago
Funny thing is the structured output in the last example.

``` { "reasoning": "`cfg` can be nil on line 42; dereferenced without check on line 47", "finding": "Possible nil‑pointer dereference", "confidence": 0.81 } ```

You know the confidence value is completely bogus, don't you?

sharkjacobs•5mo ago
Do you mean that there is no correlation between confidence and false positives or other errors?
ramity•5mo ago
elzbardico is pointing out how the author is having the confidence value generated in the output of the response rather than it being the confidence of the output.
bckr•5mo ago
Is there research solid knowledge on this?
baby•5mo ago
this trick is being used by many apps (including Github copilot reviews). The way I see it, is that if the agent has an eager-to-please problem, then you give it a way out
bckr•5mo ago
Thanks. I was talking about the confidence measure.
ramity•5mo ago
I too once fell into the trap of having an LLM generate a confidence value in a response. This is a very genuine concern to raise.
munificent•5mo ago
Easy fix, just have the LLM generate:

    {
      "reasoning": "`cfg` can be nil on line 42; dereferenced without check on line 47",
      "finding": "Possible nil‑pointer dereference",
      "confidence": 0.81,
      "confidence_in_confidence_rating": 0.54,
      "confidence_in_confidence_rating_in_confidence_rating": 0.12,
      "confidence_in_confidence_rating_in_confidence_rating_in_confidence_rating": 0.98,
      // Etc...
    }
zengid•5mo ago
confidence all the way down
GardenLetter27•5mo ago
Confidence is all you need.
lgas•5mo ago
True in many situations in life.
paisawalla•5mo ago
Wasteful. `confidence`'s type should be Array<number>, wherein confidence[N] gives the Nth derivative confidence rating.
volkk•5mo ago
i immediately noticed the same thing, but to be fair, we don't know if it's enriched by a separate service that checks the response and uses some heuristics to compute that value. If not, yeah, that is an entirely made up and useless value
MattSayar•5mo ago
Could you have a higher-order reasoning LLM generate a better confidence rating? That's how eval frameworks generally work today
skipants•5mo ago
When I was younger and more into music, when I went to a concert I would often judge if a drummer was "good" based on if they were better than me or not. I knew enough about drumming to tell how good someone was at the different parts of having that skill but also knew enough to know that I was not even close to having what it took to be a professional drummer.

This is what I feel like with this blogpost. I've barely scratched the surface of the innards of LLMs but even I know it should be completely obvious to anyone that has a product built around it that these confidence levels are completely made up.

I've never heard or used cubic before today but that part of the blog post, along with the obvious LLM generated quality of it, gives a terrible first impression.

baby•5mo ago
you know everything is made up right? And yet it just works. I too use a confidence score in an bug finder app, Github seems to use them in copilot reviews, people will use them until it is shown not to work anymore.

on the other hand this post https://www.greptile.com/blog/make-llms-shut-up says that it didn't work in their case:

> Sadly, this also failed. The LLMs judgment of its own output was nearly random. This also made the bot extremely slow because there was now a whole new inference call in the workflow.

jstummbillig•5mo ago
The multi agent thing with different roles is so obviously not a great concept, that I am very hesitant to build towards it, even thought it seems to win out right now. We want a AI that internally does what it needs to do to solve a problem, given a good enough problem description, tools and context. I really do not want to have to worry about breaking up tasks into chunks that are smaller than what I could handle myself, and I really hope that that in the near future this will go away.
brabel•5mo ago
People creating products need to do what gives results right now. And I can attest that breaking up jobs into small steps seems to work better for most scenarios. When that becomes unnecessary, creating products that are useful will become much easier for sure, but I wouldn’t hold my breath.
bckr•5mo ago
I’m not being sarcastic when I say that I think supervisor agents and agent swarms in general are the way forward here
EnPissant•5mo ago
> Explicit reasoning improves clarity. Require your AI to clearly explain its rationale first—this boosts accuracy and simplifies debugging.

I wonder what models they are using because reasoning models do this by default, even if they don't give you that output.

This post reads more like a marketing blog post than any real world advice.

iandanforth•5mo ago
I learned from a recent post (https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...) that finding security issues can take 100+ calls to an LLM to get good signal. So I wonder about agent implementers who are trying to get good signal out of single calls, even if they are specialized ones.
bckr•5mo ago
I think that article is talking about finding a previously unknown exploit. A known and well documented vulnerability should be much easier to identify
OnionBlender•5mo ago
What's funny about the bullet points in section 3 is that it only compares to the previous noisy agent, rather than having no agent. 51% fewer false positives, median comments per pull request cut by half, spending less time managing irrelevant comments? Turn it off and you could get a 100% reduction in false positives and spend zero time on irrevant AI generated comments.
hbogert•5mo ago
ah the joy of non-determinism. Have fun tweaking till you die. Also I wish youa lot of fun giving your customers buttons to disable/enable options.