frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Learnings from building AI agents

https://www.cubic.dev/blog/learnings-from-building-ai-agents
86•pomarie•2h ago

Comments

bumbledraven•1h ago
What model were they using?
jangletown•1h ago
"51% fewer false positives", how were you measuring? is this an internal or benchmarking dataset?
N_Lens•1h ago
Very vague post light on details, and as usual, feels more like a marketing pitch for the website.
flippyhead•1h ago
I found it useful.
weego•1h ago
It's recreating the monolith vs micro-service argument by proxy for a new generation to plan conference talks around.
vinnymac•1h ago
I’ve been testing this for the last few months, and it is now much quieter than before, and even more useful.
kurtis_reed•1h ago
There was a blog post from another AI code review tool: "How to Make LLMs Shut Up"

https://news.ycombinator.com/item?id=42451968

h1fra•1h ago
what I saw using 5-6 tools like this:

- PR description is never useful they barely summarize the file changes

- 90% of comments are wrong or irrelevant wether it's because it's missing context, missing tribal knowledge, missing code quality rules or wrongly interpret the code change

- 5-10% of the time it actually spots something

Not entirely sure it's worth the noise

bwfan123•1h ago
code-reviews are not a good use-case for LLMs. here's why: LLMs shine in usecases when their output is not evaluated on accuracy - for example, recommendations, semantic-search, sample snippets, images of people riding horses etc. code-reviews require accuracy.

What is a useful agent in the context of code-reviews in a large codebase is a semantic search agent which adds a comment containing related issues or PRs from the past for more context to human reviewers. This is a recommendation and is not rated on accuracy.

asdev•10m ago
the code reviews can't be effective because the LLM does not have the tribal knowledge and product context of the change. it's just reading the code at face value
mosura•1h ago
Lessons.
chanux•1h ago
https://nolearnings.com/
flippyhead•1h ago
This is LITERALLY mind blowing.
criddell•44m ago
I don't like the word learnings either, but you write for your audience and this article was probably written with the hope that it would be shared on LinkedIn.

Learnings might be the right choice here.

I wouldn't complain if the HN headline mutator were to replace "Learnings" with "lessons".

curiousgal•1h ago
> Encouraged structured thinking by forcing the AI to justify its findings first, significantly reducing arbitrary conclusions.

Ah yes, because we know very well that the current generation of AI models reasons and draws conclusions based on logic and understanding... This is the true face palm.

nico•58m ago
Humans work pretty much the same way

Several studies have shown that we first make the decision and then we reason about it to justify it

In that sense, we are not much more rational than an LLM

nzach•1h ago
I agree with the sentiment of this post. I my personal experience the usefulness of a LLM positively correlated with your ability to constrain the problem it should solve.

Prompts like 'Update this regex to match this new pattern' generally give better results than 'Fix this routing error in my server'.

Although this pattern seems true empirically, I've never seen any hard data to confirm this property(?). And this post is interesting but seems like a missed opportunity to back this idea with some numbers.

singron•1h ago
I think they skipped over a non-obvious motivating example too fast. On first glance, commenting out your CI test suite would be very bad to sneak into a random PR, and that review note might be justified.

I could imagine the situation might actually be more nuanced (e.g. adding new tests and some of them are commented out), but there isn't enough context to really determine that, and even in that case, it can be worth asking about commented out code in case the author left it that way by accident.

Aren't there plenty of more obvious nitpicks to highlight? A great nitpick example would be one where the model will also ask to reverse the resolution. E.g.

    final var items = List.copyOf(...);
    <-- Consider using an explicit type for the variable.

    final List items = List.copyOf(...);
    <-- Consider using var to avoid redundant type name.
This is clearly aggravating since it will always make review comments.
willsmith72•50m ago
yep completely agreed, how can that be the best example they chose to use?

If I reviewed that PR, absolutely I'd question why you're commenting that out. There better be a very good reason, or even a link to a ticket with a clear deadline of when it can be cleaned up/reverted

mattas•1h ago
"After extensive trial-and-error..."

IMO, this is the difference between building deterministic software and non-deterministic software (like an AI agent). It often boils down to randomly making tweaks and evaluating the outcome of those tweaks.

s1mplicissimus•1h ago
Afaik alchemists had a more reliable method than ... whatever this state of affairs is ^^
snapcaster•36m ago
You're saying alchemy is better than the scientific method?
AndrewKemendo•53m ago
Otherwise known as science

1:Observation 2:Hypothesis 3:test 4:GOTO:1

This is every thing ever built ever

What is the problem exactly?

nico•1h ago
> 2.3 Specialized Micro-Agents Over Generalized Rules Initially, our instinct was to continuously add more rules into a single large prompt to handle edge cases

This has been my experience as well. However, it seems like the platforms like Cursor/Lovable/v0/et al are doing things differently

For example, this is Lovable’s leaked system prompt, 1550 lines: https://github.com/x1xhlol/system-prompts-and-models-of-ai-t...

Is there a trick to making gigantic system prompts work well?

shenberg•46m ago
When I read "51% fewer false positives" followed immediately by "Median comments per pull request cut by half" it makes me wonder how many true positives they find. That's maybe unfair as my reference is automated tooling in the security world, where the true-positive/false-positive ratio is so bad that a 50% reduction in false positives is a drop in the bucket
Oras•3m ago
The problem is that, regardless of how you try to use "micro-agents " as a marketing term, LLMs are instructed to return a result.

They will always try to come up with something.

The example provided was a poor one. The comment from LLM was solid. Why would you comment out a step in the pipeline instead of just deleting it? I would comment the same in a PR.

Cross-Compiling 10k Rust CLI Crates Statically

https://blog.pkgforge.dev/cross-compiling-10000-rust-cli-crates-statically
2•todsacerdoti•39s ago•0 comments

Introduction to embedded development with Rust: Overview of the ecosystem

https://kerkour.com/introduction-to-embedded-development-with-rust
1•unsolved73•58s ago•0 comments

Field Guide to the North American Weigh Station

https://hackaday.com/2025/06/26/field-guide-to-the-north-american-weigh-station/
1•zdw•1m ago•0 comments

Log-Survival to Death Rate

https://entropicthoughts.com/log-survival-to-death-rate
1•kqr•2m ago•0 comments

Flux Kontext dev is now released

https://huggingface.co/spaces/wavespeed/FLUX-Kontext-Dev-Ultra-Fast
1•chengzeyi•2m ago•1 comments

Show HN: Personal Branding Photos for Women in 10 mins

https://www.gostudio.ai/?fp_ref=10654
1•gostudio_ai•6m ago•0 comments

Tennis Scorigami

https://www.tennis-scorigami.com/
1•jlarks32•6m ago•0 comments

Young People Face a Hiring Crisis. AI Is Making It Worse

https://derekthompson.substack.com/p/young-people-face-a-hiring-crisis
2•herbertl•6m ago•0 comments

KDE Plasma 6.4 review – A worrying trend

https://www.dedoimedo.com/computers/plasma-6-4-review.html
1•jandeboevrie•7m ago•0 comments

Ask HN: How can I promote my Free Startup Mentoring?

1•riley-i•10m ago•2 comments

My favorite account is a library in Ohio

https://www.milkkarten.net/p/columbus-metropolitan-library-social-media
2•herbertl•11m ago•0 comments

Context Engineering: A Primer

https://ai.intellectronica.net/context-engineering
1•intellectronica•11m ago•2 comments

Lounge It

https://loungeit.carrd.co
1•mcordova•11m ago•1 comments

Show HN: I built a UI to manage Claude Code worktrees

https://github.com/stravu/crystal
3•jbentley1•11m ago•0 comments

Mysite.ai – your first AI employee that builds websites in under 2 minutes

https://mysite.ai/
1•codejetai•12m ago•2 comments

Convert Words to Time

https://wordstotime.com/
1•bookofjoe•12m ago•0 comments

Beyond the Buzz: How 'Deep Tech' Startups Are Changing the Game

https://fossforce.com/2025/06/beyond-the-buzz-how-deep-tech-startups-are-changing-the-game/
1•dxs•14m ago•0 comments

ProductHunt Isn't the Place for Indie Devs Anymore

https://old.reddit.com/r/indiehackers/comments/1ll2bo6/producthunt_isnt_the_place_for_indie_devs_anymore/
2•vikingmute•21m ago•1 comments

Weird Expressions in Rust

https://www.wakunguma.com/blog/rust-weird-expr
2•Bogdanp•22m ago•0 comments

Why Most SBOMs Fail and What to Do About It

https://ovalenzuela.com/2025/02/why-most-sboms-fail-and-what-to-do-about-it.html
1•Tomte•22m ago•0 comments

Tech AI is doing 30%-50% of the work at Salesforce, CEO Marc Benioff says

https://www.cnbc.com/2025/06/26/ai-salesforce-benioff.html
4•kamaraju•23m ago•2 comments

LLM Code Review Maven Plugin

https://github.com/QuasarByte/llm-code-review-maven-plugin
1•romanta•23m ago•0 comments

The CrateDB MCP Server

https://community.cratedb.com/t/introducing-the-cratedb-mcp-server/2043
1•kneth•24m ago•1 comments

Built a tool to help SaaS teams use their customer feedback

https://www.inov-ai.tech/
2•brightUiso•25m ago•1 comments

European Venture Prisoner's Dilemma

https://nodesventures.substack.com/p/european-venture-prisoners-dilemma
1•AnhTho_FR•26m ago•0 comments

Offerwall gives publishers more options to monetize content

https://blog.google/products/ads-commerce/offerwall-gives-publishers-more-options-audiences-more-control/
1•xnx•26m ago•0 comments

Gridlocked: AI's power needs could short-circuit US infrastructure

https://www.theregister.com/2025/06/26/us_datacenter_power_crunch/
2•rntn•27m ago•0 comments

The cheat codes of technological progress

https://www.exponentialview.co/p/why-technological-laws-still-rule
1•surprisetalk•27m ago•0 comments

Show HN: An open-source app to query 10 AI models at once

https://github.com/Nexarithm/multi_model_chat
2•nexarithm•27m ago•0 comments

Modern Node.js Patterns for 2025

https://kashw1n.com/blog/nodejs-2025/
2•kashw1n•28m ago•0 comments