frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•gozzoo•1m ago•0 comments

A Horrible Conclusion

https://addisoncrump.info/research/a-horrible-conclusion/
1•todsacerdoti•1m ago•0 comments

I spent $10k to automate my research at OpenAI with Codex

https://twitter.com/KarelDoostrlnck/status/2019477361557926281
1•tosh•2m ago•0 comments

From Zero to Hero: A Spring Boot Deep Dive

https://jcob-sikorski.github.io/me/
1•jjcob_sikorski•2m ago•0 comments

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

https://zenodo.org/records/18395618
1•alemonti06•7m ago•1 comments

Cook New Emojis

https://emoji.supply/kitchen/
1•vasanthv•10m ago•0 comments

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

https://mcp-tool-shop-org.github.io/LoKey-Typer/
1•mikeyfrilot•13m ago•0 comments

Long-Sought Proof Tames Some of Math's Unruliest Equations

https://www.quantamagazine.org/long-sought-proof-tames-some-of-maths-unruliest-equations-20260206/
1•asplake•14m ago•0 comments

Hacking the last Z80 computer – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/FEHLHY-hacking_the_last_z80_computer_ever_made/
1•michalpleban•14m ago•0 comments

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

https://github.com/webllm/browser-use
1•unadlib•15m ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html
1•mitchbob•15m ago•1 comments

Software Engineering Is Back

https://blog.alaindichiappari.dev/p/software-engineering-is-back
1•alainrk•16m ago•0 comments

Storyship: Turn Screen Recordings into Professional Demos

https://storyship.app/
1•JohnsonZou6523•17m ago•0 comments

Reputation Scores for GitHub Accounts

https://shkspr.mobi/blog/2026/02/reputation-scores-for-github-accounts/
1•edent•20m ago•0 comments

A BSOD for All Seasons – Send Bad News via a Kernel Panic

https://bsod-fas.pages.dev/
1•keepamovin•23m ago•0 comments

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

https://orcha.nl
1•buildingwdavid•23m ago•0 comments

Omarchy First Impressions

https://brianlovin.com/writing/omarchy-first-impressions-CEEstJk
2•tosh•29m ago•1 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
2•onurkanbkrc•30m ago•0 comments

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

https://github.com/Concode0/Versor
1•concode0•30m ago•1 comments

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

https://medresearch-ai.org/hypotheses-hub/
1•panossk•33m ago•0 comments

Big Tech vs. OpenClaw

https://www.jakequist.com/thoughts/big-tech-vs-openclaw/
1•headalgorithm•36m ago•0 comments

Anofox Forecast

https://anofox.com/docs/forecast/
1•marklit•36m ago•0 comments

Ask HN: How do you figure out where data lives across 100 microservices?

1•doodledood•36m ago•0 comments

Motus: A Unified Latent Action World Model

https://arxiv.org/abs/2512.13030
1•mnming•36m ago•0 comments

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

https://www.thedailybeast.com/obsessed/rotten-tomatoes-desperately-claims-impossible-rating-for-m...
3•juujian•38m ago•2 comments

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

https://www.science.org/doi/10.1126/scisignal.adv0660
1•thunderbong•40m ago•0 comments

Los Alamos Primer

https://blog.szczepan.org/blog/los-alamos-primer/
1•alkyon•42m ago•0 comments

NewASM Virtual Machine

https://github.com/bracesoftware/newasm
2•DEntisT_•45m ago•0 comments

Terminal-Bench 2.0 Leaderboard

https://www.tbench.ai/leaderboard/terminal-bench/2.0
2•tosh•45m ago•0 comments

I vibe coded a BBS bank with a real working ledger

https://mini-ledger.exe.xyz/
1•simonvc•45m ago•1 comments
Open in hackernews

Show HN: Continuous Claude – run Claude Code in a loop

https://github.com/AnandChowdhary/continuous-claude
170•anandchowdhary•2mo ago
Continuous Claude is a CLI wrapper I made that runs Claude Code in an iterative loop with persistent context, automatically driving a PR-based workflow. Each iteration creates a branch, applies a focused code change, generates a commit, opens a PR via GitHub's CLI, waits for required checks and reviews, merges if green, and records state into a shared notes file.

This avoids the typical stateless one-shot pattern of current coding agents and enables multi-step changes without losing intermediate reasoning, test failures, or partial progress.

The tool is useful for tasks that require many small, serial modifications: increasing test coverage, large refactors, dependency upgrades guided by release notes, or framework migrations.

Blog post about this: https://anandchowdhary.com/blog/2025/running-claude-code-in-...

Comments

leobg•2mo ago
Missed opportunity to call it Claude Incontinent (CLI).
apapalns•2mo ago
> codebase with hundreds of thousands of lines of code and go from 0% to 80%+ coverage in the next few weeks

I had a coworker do this with windsurf + manual driving awhile back and it was an absolute mess. Awful tests that were unmaintainable and next to useless (too much mocking, testing that the code “works the way it was written”, etc.). Writing a useful test suite is one of the most important parts of a codebase and requires careful deliberate thought. Without deep understanding of business logic (which takes time and is often lost after the initial devs move on) you’re not gonna get great tests.

To be fair to AI, we hired a “consultant” that also got us this same level of testing so it’s not like there is a high bar out there. It’s just not the kind of problem you can solve in 2 weeks.

id00•2mo ago
I agree. It is very easy to fall in the trap: "I let AI write all the tests" and then find yourself in a situation where you have an unmaintainable mess with the only way to fix broken test within a reasonable time is to blindly accept AI to do that. Which exposes you to the similar level of risk as running any unchecked AI code - you just can't trust that it works correctly
piker•2mo ago
"My code isn't working. I know, I'll have an AI write my unit tests." Now you have two problems.
simonw•2mo ago
I find coding agents can produce very high quality tests if and only if you give them detailed guidance and good starting examples.

Ask a coding agent to build tests for a project that has none and you're likely to get all sorts of messy mocks and tests that exercise internals when really you want them to exercise the top level public API of the project.

Give them just a few starting examples that demonstrate how to create a good testable environment without mocking and test the higher level APIs and they are much less likely to make a catastrophic mess.

You're still going to have to keep an eye on what they're doing and carefully review their work though!

throwup238•2mo ago
I've think they're also much better at creating useful end to end UI tests than unit or integration tests, but unfortunately those are hard to create self contained environments for without bringing a lot of baggage and docker containers, which not all agent VMs might support yet. Getting headless QT running was a pain too, but now ChatGPT Codex can see screenshots and show them in chat (Claude Code can't show them in the chat for some reason) and it's been generating much better end to end tests than I've seen for unit/integration.
cortesoft•2mo ago
> I find coding agents can produce very high quality tests if and only if you give them detailed guidance and good starting examples.

I find this to be true for all AI coding, period. When I have the problem fully solved in my head, and I write the instructions to explicitly and fully describe my solution, the code that is generated works remarkably well. If I am not sure how it should work and give more vague instructions, things don't work so well.

stavros•2mo ago
Yeah, same. Usually I'll ask the agent for a few alternatives, to make sure I'm not missing something, but the solution I wanted tends to be the best one. I also get into a lot of me saying "hm, why are you doing it that way?" "Oh yeah, that isn't actually going to work, sorry".
teaearlgraycold•2mo ago
Yes, but the act of writing code is an important part of figuring out what you need. So I’m left wondering how much of a prefect the AI can actually help with. To be clear I do use AI for some code gen. But I try to use it less than I see others use it.
cortesoft•2mo ago
Eh, I think my decades of experience writing my own code was necessary for me to develop the skills to be able to precisely tell the AI what to build, but I don't think I need to (always) write new code to know how to know what I need.

Now, if the thing I am building requires a technology I am not familiar with, I will spend some time reading and writing some simple test code to learn how it works, but once I understand it I can then let the AI build from scratch.

Of course, this does rely on the fact that I have years of coding experience that came prior to AI, and I do wonder how new coders can do it without putting in the work to learn how to build working software without AI before using AI.

teaearlgraycold•2mo ago
It’s not just about new tech. It’s about new businesses and projects.
omgbear•2mo ago
Left to his own devices, I found Claude liked to copy the code under test into the test files to 'remove dependencies' :/

Or would return early from playwright tests when the desired targets couldn't be found instead of failing.

But I agree that with some guidance and a better CLAUDE.md, can work well!

anandchowdhary•2mo ago
Indeed the case - luckily my codebase had some tests already and a pretty decent CLAUDE.md file so I got results I’m happy with.
Vinnl•2mo ago
I feel like that leaves me with the hard part of writing tests, and only saves me the bit I can usually power through quickly because it's easy to get into a flow state for it.
btown•2mo ago
Has anyone had success with specific prompts to avoid the agent over-indexing on implementation details? For instance, something like: "Before each test case, add a comment justifying the business case for every assumption made here, without regards to implementation details. If this cannot be made succinct, or if there is ambiguity in the business case, the test case should not be generated."
freedomben•2mo ago
I've had reasonable success from doing something like this, though it is my current opinion that it's better to write the first few tests yourself to establish a clear pattern and approach. However, if you don't care that much (which is common with side projects):

Starting point: small-ish codebase, no tests at all:

    > I'd like to add a test suite to this project.  It should follow language best practices.  It should use standard tooling as much as possible.  It should focus on testing real code, not on mocking/stubbing, though mocking/stubbing is ok for things like third party services and parts of the code base that can't reasonably run in a test environment.  What are some design options we could do? Don't write any code yet, present me the best of the options and let me guide you.
    > Ok, I like option number two.  Put the basic framework in place and write a couple of dummy tests.
    > Great, let's go ahead and write some real tests for module X.
and etc. For a project with an existing and mature test suite, it's much easier:

    > I'd like to add a test (or improve a test) for module X.  Use the existing helpers and if you find yourself needing new helpers, ask me about the approach before implementing
I've also found it helpful to put things in AGENTS.md or CLAUDE.md about tests and my preferences, such as:

    - Tests should not rely on sleep to avoid timing issues.  If there is a timing issue, present me with options and let me guide you
    - Tests should not follow an extreme DRY pattern, favor human readability over absolute DRYness
    - Tests should focus on testing real code, not on mocking/stubbing, though mocking/stubbing is ok for things like third party services and parts of the code base that can't reasonably run in a test environment.
    - Tests should not make assumptions about the current running state of the environment, nor should they do anything that isn't cleaned up before completing the test to avoid polluting future tests
I do want to stress that every project and framework is different and has different needs. As you discover the AI doing something you don't like, add it to the prompts or the AGENTS.md/CLAUDE.md. Eventually it will get pretty decent, though never blindly trust it because a butterfly flapping it's wings in Canada sometimes causes it to do unexpected things.
typpilol•2mo ago
I was able to do this with vitest and a ton of lint rules.
andai•2mo ago
Does it depend on the model? I would have expected the bigger ones to be better with common sense and not fixating on irrelevant details. But I have only used them with quite small codebases so far. (Which have basically no internals to exercise!)
krschacht•2mo ago
I find most human agents can only produce high quality tests if you give them detailed guidance and good starting examples. :)
cpursley•2mo ago
Which language? I've found Claude very good at Elixir test coverage (surprisingly) but a dumpster fire with any sort JS/TS testing.
LASR•2mo ago
There is no free lunch. The amount of prompt writing to give the LLM enough context about your codebase etc is comparable to writing the tests yourself.

Code assistance tools might speed up your workflow by maybe 50% or even 100%, but it's not the geometric scaling that is commonly touted as the benefits of autonomous agentic AI.

And this is not a model capability issue that goes away with newer generations. But it's a human input problem.

anandchowdhary•2mo ago
I don't know if this is true.

For example, you can spend a few hours writing a really good set of initial tests that cover 10% of your codebase, and another few hours with an AGENTS.md that gives the LLM enough context about the rest of the codebase. But after that, there's a free* lunch because the agent can write all the other tests for you using that initial set and the context.

This also works with "here's how I created the Slack API integration, please create the Teams integration now" because it has enough to learn from, so that's free* too. This kind of pattern recognition means that prompting is O(1) but the model can do O(n) from that (I know, terrible analogy).

*Also literally becomes free as the cost of tokens approaches zero

jaredsohn•2mo ago
A neat part of this is it mimics how people get onboarded onto codebases. People usually aren't figuring out how to write tests from scratch; they look at the current best practices for similar functionality in the codebase and start there. And then as they continue to work there they try to influence new best practices.
nl•2mo ago
It depends on the problem domain.

I recently had a bunch of Claude credits so got it to write a language implementation for me. It probably took 4 hours of my time, but judging by other implementations online I'd say the average implementation time is hundreds of hours.

The fact that the model knew the language and there are existing tests I could use is a radical difference.

PunchyHamster•2mo ago
Cleanroom design of "this is a function's interface, it does this and that, write tests for that function to pass" generally can get you pretty decent results.

But "throw vague prompt at AI direction" does about as well as doing same thing with an intern.

colechristensen•2mo ago
With recent experience I'm thinking the correct solution is a separate agent with prompting to exclusively be a test critic given a growing list of bad testing patterns to avoid, agent 2 gives feedback to agent 1. Separating agents into having unique jobs.

An agent does a good job fixing it's own bad ideas when it can run tests, but the biggest blocker I've been having is the agent writing bad tests and getting stuck or claiming success by lobotomizing a test. I got pretty far with myself being the test critic and that being mostly the only input the agent got after the initial prompt. I'm just betting it could be done with a second agent.

andai•2mo ago
I had a funny experience with Claude (Web) the other day.

Uploaded a Prolog interpreter in Python and asked for a JS version. It surprised my by not just giving me a code block, but actually running a bunch of commands in its little VM, setting up a npm project, it even wrote a test suite and ran it to make sure all the tests pass!

I was very impressed, then I opened the tests script and saw like 15 lines of code, which ran some random functions, did nothing to test their correctness, and just printed "Test passed!" regardless of the result.

gloosx•2mo ago
Ah yes, classic "increase test coverage for the sake of increasing test coverage".

Aligns with vibe-coding values well: number go up – exec happy.

janaagaard•2mo ago
Kudos on making Bash readable.

(https://github.com/AnandChowdhary/continuous-claude/blob/mai...)

jdc0589•2mo ago
im not saying OP did this, but I've actually had AI spit out some pretty stellar bash scripts, surprisingly
anandchowdhary•2mo ago
No, you're right. It was a pretty collaborative effort with me and Claude!
svieira•2mo ago
FYI, you're missing two patterns that allow the `--key=value` admirers and the `-alltheshortopsinasinglestring` spacebar savers among us to be happy (for the otherwise excellent options parsing code).

   shopt -s extglob
   case "$1"
     # Flag support - allow -xyz z-takes-params
     -@(a|b|c)*) _flag=${1:1:1}; _rest=${1:2}; shift; set -- "-$_flag" "-$_rest" "$@";;
     # Param=Value support
     -?(-)*=*) _key=${1%%=*}; _value=${1#*=}; shift; set -- "${_key}" "$_value" "$@";;
   esac
anandchowdhary•2mo ago
For letting me know! Would you like to create a PR? Otherwise I'll add you as a Co-Authored-By!
svieira•2mo ago
Co-authored is fine, thanks for asking!
insin•2mo ago
Gesundheit
cortesoft•2mo ago
The emojis give it away
anandchowdhary•2mo ago
Nah - as you'll see from my work from a pre-coding agents age, I like emoji :)

Here are receipts some from 2020: - https://github.com/AnandChowdhary/bsc-thesis - https://github.com/AnandChowdhary/slack-netlify-trigger - https://github.com/AnandChowdhary/analytics-icons

johntash•2mo ago
LLMs had to learn where to use emoji from somewhere, now we know who to blame ;)
jdc0589•2mo ago
I got a couple bug reports years ago on https://github.com/jdavisclark/jsformat for not dealing unicode emojis correctly when used in variable names in javascript. people are weird.
rkozik1989•2mo ago
I don't think its that surprising. Bash is old as dirt and scripts by definition are meant to be simple. Where AI struggles is when you add complexity like object-oriented design. That's when the effect of it trying to solve every problem in a way unique to it just takes shit off the rails. LLMs known design patterns exist but they don't know how to use them because that's not how deep learning approaches problem solving.
decide1000•2mo ago
How does it handle questions asked by Claude?
anandchowdhary•2mo ago
It sends a flag that dangerously allows Claude to just do whatever it wants and only give us the final answer. It doesn't do the back-and-forth or ask questions.
CharlesW•2mo ago
The `--dangerously-skip-permissions` flag (a.k.a. "YOLO mode") does do the back-and-forth and asks questions, so this is a bit more than that.
brumar•2mo ago
Yes. I did not look but most probably the non interactive mode flag is used (-p)
anandchowdhary•2mo ago
It does `claude -p "This is the prompt" --dangerously-skip-permissions --output-format json`
CharlesW•2mo ago
Oh! TIL, thank you.
stpedgwdgfhgdd•2mo ago
Iteratively working is a MUST for more than trivial fixes. This continuous loop could work for trivial refactorings / maintenance tasks.
namanyayg•2mo ago
Exactly what I needed! I might use it for test coverage on an ancient project I need to improve...
_dark_matter_•2mo ago
Fyi
cog-flex•2mo ago
Does this exist for Codex?
tinodb•2mo ago
You can do this with a single while loop in bash. No need for this project. Search for “ralph wiggum ai coding” and find a couple of guys that share plenty of examples and nerd out about it
kami23•2mo ago
I've dubbed my loop of this as 'sicko mode' at work as I've become a bit obsessed with automating everything little thing in my flow, so I can focus on just features and bugs. It feels like a game to me and I enjoy it a lot.
lizardking•2mo ago
It's oddly satisfying to watch your tooling improve itself.
jes5199•2mo ago
can it read code review comments? I've been finding that having claude write code but letting codex review PRs is a productive workflow, claude code is capable of reading the feedback left in comments and is pretty good at following the advice.
jerezzprime•2mo ago
Have you tried GitHub Copilot? I've been trying it out directly in my PRs like you suggest. Works pretty well sometimes.
jes5199•2mo ago
I find that ChatGPT’s Codex reviews - which can also be set up to happen automatically on all PRs - seem smarter than Copilot’s, and make fewer mistakes. But these things change fast, maybe Copilot caught up and I didn’t notice
tinodb•2mo ago
No codex catches genuine bugs here that multiple reviewers would have overlooked, whilst copilot only comes with nitpicks. And codex does none of those, which is also great.
stpedgwdgfhgdd•2mo ago
I’m letting Claude Code review the code as part of a gitlab CI job. It adds inline comments (using curl and the http API, nightmare to get right as glab does not support this)

CC can also read the inline comments and creates fixes. Now thinking of adding an extra CI job that will address the review comments in a separate MR.

DeathArrow•2mo ago
>run Claude code in a loop

And watch your bank account go brrr!

lizardking•2mo ago
If you have a max plan you just watch your usage get throttled
i4k•2mo ago
I was expecting to see links to a bunch of opensource successful examples, projects self-managed and continuously adding code and getting better.

99.9999% of AI software is vaporware.