frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents
1•archb•1m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...
1•walterbell•1m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/
1•danver0•2m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/
1•bumahkib7•2m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag
1•artigent•7m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor
2•dragandj•9m ago•0 comments

Tmux to Zellij (and Back)

https://www.mauriciopoppe.com/notes/tmux-to-zellij/
1•maurizzzio•9m ago•1 comments

Ask HN: How are you using specialized agents to accelerate your work?

1•otterley•11m ago•0 comments

Passing user_id through 6 services? OTel Baggage fixes this

https://signoz.io/blog/otel-baggage/
1•pranay01•12m ago•0 comments

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

https://davmail.sourceforge.net/
1•todsacerdoti•12m ago•0 comments

Visual data modelling in the browser (open source)

https://github.com/sqlmodel/sqlmodel
1•Sean766•14m ago•0 comments

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

https://github.com/chinonsochikelue/tharos
1•fluantix•15m ago•0 comments

Oddly Simple GUI Programs

https://simonsafar.com/2024/win32_lights/
1•MaximilianEmel•15m ago•0 comments

The New Playbook for Leaders [pdf]

https://www.ibli.com/IBLI%20OnePagers%20The%20Plays%20Summarized.pdf
1•mooreds•15m ago•0 comments

Interactive Unboxing of J Dilla's Donuts

https://donuts20.vercel.app
1•sngahane•17m ago•0 comments

OneCourt helps blind and low-vision fans to track Super Bowl live

https://www.dezeen.com/2026/02/06/onecourt-tactile-device-super-bowl-blind-low-vision-fans/
1•gaws•19m ago•0 comments

Rudolf Vrba

https://en.wikipedia.org/wiki/Rudolf_Vrba
1•mooreds•19m ago•0 comments

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

https://www.medpagetoday.com/neurology/autism/119747
1•paulpauper•20m ago•0 comments

Wellness Hotels Discovery Application

https://aurio.place/
1•cherrylinedev•21m ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...
1•mooreds•21m ago•0 comments

Sebastian Galiani on the Marginal Revolution

https://marginalrevolution.com/marginalrevolution/2026/02/sebastian-galiani-on-the-marginal-revol...
2•paulpauper•25m ago•0 comments

Ask HN: Are we at the point where software can improve itself?

1•ManuelKiessling•25m ago•1 comments

Binance Gives Trump Family's Crypto Firm a Leg Up

https://www.nytimes.com/2026/02/07/business/binance-trump-crypto.html
1•paulpauper•25m ago•1 comments

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

https://old.reddit.com/r/ClaudeCode/comments/1qy5l0n/reverse_engineering_chinese_shitprogram_for/
1•edward•25m ago•0 comments

Indian Culture

https://indianculture.gov.in/
1•saikatsg•28m ago•0 comments

Show HN: Maravel-Framework 10.61 prevents circular dependency

https://marius-ciclistu.medium.com/maravel-framework-10-61-0-prevents-circular-dependency-cdb5d25...
1•marius-ciclistu•28m ago•0 comments

The age of a treacherous, falling dollar

https://www.economist.com/leaders/2026/02/05/the-age-of-a-treacherous-falling-dollar
2•stopbulying•28m ago•0 comments

Ask HN: AI Generated Diagrams

1•voidhorse•31m ago•0 comments

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
7•josephcsible•31m ago•2 comments

Show HN: A delightful Mac app to vibe code beautiful iOS apps

https://milq.ai/hacker-news
6•jdjuwadi•34m ago•1 comments
Open in hackernews

Teaching GPT-5 to Use a Computer

https://prava.co/archon/
94•Areibman•5mo ago

Comments

daxfohl•5mo ago
Very cool. I've been thinking for a while that this is where things will end up. While custom AI integrations per service/product/whatever can be better and more efficient, there's always going to be stuff that doesn't have AI integrations but your workflow will need to use.

Without this, AI is going to be limited and kloodgy. Like if I wanted to have AI run a FEA simulation on some CAD model, I have to wait until the FEA software, the CAD software, the corporate models repo, etc., etc. all have AI integrations and then create some custom agent that glues them all together. Once AI can just control the computer effectively, then it can look up the instruction manuals for each of these pieces of software online, and then just have at it e2e like a human would. It can even ping you over slack if it gets stuck on something.

I think once stuff like this becomes possible, custom AI integrations will become less necessary. I'm sure they'll continue to exist for special cases, but the other nice thing about a generic computer-use agent is that you can record the stream and see exactly what it's doing, so a huge increase in observability. It can even demo to human workers how to do things because it works via the same interfaces.

kevingadd•5mo ago
One potential virtuous cycle here is that accessibility trees used by tools like screen readers are also a nice potential way for a model to consume information about what's on screen and how it can be interacted with. So it creates an additional incentive for improving the accessibility of new and existing software, because doing that lights up integration with future models.
alhirzel•5mo ago
This cycle starts with an integration for model developers. I wonder if anyone is working on a generic ARIA hookup, as well as whatever standards are necessary for desktop/smartphone integration?
yuliyp•5mo ago
I can't help but feel like some sort of hybrid approach: use GPT5 for the strategy, then a more direct ML model for actually executing the strategy might work better than trying to use reasoning directly for input control, would work better than trying to reason your way through driving.
Philpax•5mo ago
That's what the article describes, yes.
yuliyp•5mo ago
Sorry, I guess I meant something like motion planning etc. rather than another transformer.
deadbabe•5mo ago
I imagine in the future someone will make an Agent-First OS that is entirely built from the ground up to be run by AI and runs off the assumption that there are no human users or that their usage is limited. That will be interesting, imagine all the things you could do differently, the design choices you could make. You lose a lot by accommodating human ergonomics.
Waterluvian•5mo ago
What might you imagine being different in an “agent first OS” compared to a terminal only Linux distribution?
deadbabe•5mo ago
No I/O, ability to mount vectors and graphs, bare metal file system, structured traces, deterministic replayability of commands
mike_hearn•5mo ago
I can imagine a different UI. Instead of a browser with tabs and an address bar, you have a series of infinitely scrolling "terminals" with a chat bar at the bottom. Each one is just a chat session into which objects like web pages, images, etc can be embedded and they scroll horizontally a bit like multiple desktops do. You could zoom in and out as much as you want, the goal is to let you easily monitor many parallel tasks.

From the AI's perspective a filesystem that vector indexes data on the fly would make sense, perhaps, along with an ability for the user to share out fine-grained permissions with it.

pjerem•5mo ago
For now, I still have hard time believing in unattended agents. Be it using a computer or generating programs.

I mean that would sure be a nice demo, but it’s too probabilistic to give AI agents real tasks (and it seems that isn’t going to change anytime soon).

It’s all fun and games until it implies spending money and/or taking responsibility.

And be it in personal life or in businesses, money and responsibility are vital things.

Sure you can ask LLMs to generate a minesweeper game with custom rules or ask it to summarize headlines from HN.

Releasing a program an unattended agent generated to real clients that pay you or asking it to order a non refundable flight ticket is something else.

However, I can see the point of an agent that uses my computer while I watch it.

ahmedhawas123•5mo ago
This is cool though wanted to share a couple of thoughts for reflection:

I feel like your demo video is not the greatest one to highlight the capability. A browsing use case likely does require a key press->planning loop, but a gaming use case, or a well known software (e.g., excel), may be able to think ahead 10-20 key presses before needing the next loop / verification. The current demo makes it seem slow / prototype-like.

Also, the X/Y approach is interesting when thinking about a generic approach to screen management. But for example for browsers, you're likely adding overhead relative to just marking the specific div/buttons that are on screen and having those be part of the reasoning (e.g., "Click button X at div with path XX"). It may be helpful to think about the workflows you are going after and what kind of accelerated management you have over them.

muratsu•5mo ago
For someone who wants to do “archon‑mini is a 7B Qwen‑2.5‑VL–based executor (dynamic‑res ViT) fine‑tuned with GRPO for GUI grounding” part at home, is there a guide/post you would recommend?
reilly3000•5mo ago
This may be a good start: https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
boothby•5mo ago
This is great, once we get LLMs playing Doom, we'll all be free to touch grass.
thrown-0825•5mo ago
Imagine getting beat by a bot and it also has the capability to talk trash to you.
heyitsguay•5mo ago
"Unfortunately, my content guidelines prohibit me from describing my activities with your mother last night"
joshuamoyers•5mo ago
I really like this approach. Nice job!

> We also plan to compile solved steps into micro‑policies. If you're running something like a RPA task or similar workflow as before, you can simply run the execution locally (with archon-mini running locally) and not have to worry about the planning. Over time, the planner is a background teacher, not a crutch.

Conceptually, I really like this - why re-do the work of reasoning about an already solved task? Just do it again. For some plausibly large majority of things, this could speed things up considerably.

> In the future we hope to run a streaming capture pipeline similar to Gemma 3. Consuming frames at 20–30 fps, emitting actions at 5–10 Hz, and verifying state on each commit.

I love targets like this. It makes you tune the architecture and abstractions to push the boundary of whats possible with a traditional agent loop.

The salience heat map compression is a great idea. I think you could take this a step further and tune a model so that it compresses an image into a textual semantic/interactive element hierarchy. This is effectively what browser-use is doing, just using javascript instead of a vision model.

This seems like a task that would benefit from narrow focus. I'm aware of the "Bitter Lesson," but my intuition seems to tell me that chaining together fit to purpose classification as an input to an intelligent planning system is the way to go.

mike_hearn•5mo ago
Nice writeup! A few years ago I proposed to a friend that he should try rendering accessibility trees and fine-tuning a model to issue tool calls over them; I don't know if this has been tried and failed or if nobody bothered trying because so few people know desktop APIs anymore. The main advantage would be accuracy/speed and avoiding the need for too many image tokens (you still need them for things that are actually images, though).
StopDisinfo910•5mo ago
It’s pretty interesting.

I see a ton of potential for testing. RPA can quickly get annoying because even simple change can break automation. The LLM ability to “reason” could really bridge the gap.

Coupled with agents who help turn specification/stories into a testing plan, I could really see automated end to end testing becoming far cheaper than it is nowadays in the near future.

That’s a very good piece of news for system reliability.

soared•5mo ago
Can I use Archon? There is a use case I’ve wanted to explore and this tool can do most of it
creatonez•5mo ago
Can we please squash the idea of turning this into a product early, so no one starts to take it seriously? Agentic AI shouldn't be hooked up to any environment that doesn't have a flawless "undo" or "rewind 5 minutes" button. Would you hook up randomized mouse/keyboard input into a computer that you need to get work done on?

At the very least, display a very strongly worded warning message when the tool is run in a non-VM environment. Internet connectivity is still dangerous, but at least VMs can be snapshot. And it should not be packaged as an end-user product with a bar at the top of the screen, period.

This is a risky product and those who developed it have every ability to know that, based on the history of AI hallucinations. Not because it will escape or self replicate or other things claimed by the various idiotic AI religions, but because one of the first things it will inevitably do is screw up someone's work and cause data loss.

darepublic•5mo ago
This is very interesting, my thinking has been along these same lines. This covers some of the more complicated engineering tasks required to pave the way for LLMs to automate diverse tasks with much higher accuracy and lower cost. Building the ultimate RPA as it were.