frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Why the simplest desktop agent abstraction wins

https://www.bytebot.ai/blog/designing-bytebot-why-the-simplest-desktop-agent-abstraction-wins
33•atupem•2d ago

Comments

adityavinodh•2d ago
What are the biggest issues that the agent faces at the moment? I still find these general purpose agents frustrating to use at times because people position it as if it could do anything and then when you give it a reasonably complex task it breaks down.

I guess if someone figured out way to minimize the impact of an error, like a way for it to gracefully handle it without it feeling like too much work, that would fix most of the problems.

atupem•2d ago
Lots of interesting issues:

- The agent has a tool to set it's task to 'completed', 'failed', or 'needs_help', with the last one being a option for human in the loop scenarios. Sometimes the agent gets lazy and says it needs help prematurely.

- Additionally, the agent can create subtasks for itself, either to run immediately, or to schedule in the future. Here it again can call that tool a bit too eagerly, filling duplicate subtasks for a task that involves repetitive work.

- Properly handling super long running tasks, that run for 1+ hours. The context window eventually hits it's limit (this will be addressed this week)

Aside from those top of mind issues, there's a whole bunch of scaffolding issues - filesystem permissions, prompt injection security, i/o support, token cost - lot's to improve!

We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor

lelanthran•7h ago
> We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor

Slow down cowboy; we're seeing "flashes of brilliance" and "that this is the right form factor" for writing code only!

I'm still waiting for AI/LLM's to be posing a danger to jobs other than those in software development and the arts.

furyofantares•3h ago
This one isn't for coding, they mention in the post that coding agents thrive in custom tool-use environments.
lelanthran•2h ago
> This one isn't for coding, they mention in the post that coding agents thrive in custom tool-use environments.

Well, that is why I am skeptical and said

>> I'm still waiting for AI/LLM's to be posing a danger to jobs other than those in software development and the arts.

The goal of this product is admirable but, I feel, lacks some grounding: doing screenshots, then converting those images to text, then processing, then converting that to actions, then converting the actions to input events ... results in 4 separate points of failure. So many points of failure each with a success rate (last I checked) of <90% gives you something stupid like an eventual success rate of 0.9 * 0.9 * 0.9 * 0.9 = 0.66.

The same iterative workflow for software development is pretty much 2 steps: process input, then produce output, with 100% success (or close to it for "output", as it's just rewriting the files according to the processing) and 90% for processing which is why it appears to work so well[1].

I dabbled briefly in this and explored a few different ways of making LLMs use the ERP/business system effectively, and with all the current popular business systems, this is simply not possible with a high enough success rate because those systems have few "structured text" output, and even fewer "structured text" input. In fact, some of them have exactly zero "structured text" input.

To make the most of LLMs in your business system, you're going to need a new one that is primarily text-IO based (structured text, if necessary) and only secondarily GUI-for-humans based.

[1] In truth, using tools is a poor way to extend the reach and grasp of the LLM into the operator's context.

It works well for one mainstream use-case: software development, because then you need less than a dozen tools to automate an entire development iteration (read file, list files, insert into file, run test command, etc).

Try doing that with a mini-ERP type of system; there's just no way to keep a small set of 12 tools that can do any workflow that the operator can do. You'll quickly run into a situation where every prompt request includes tool description for about 500 tool calls.

Agentic automation is working very well for coding, where all the input is structured text, all the output is structured text, and all the changes are structured text.

The only way for ERP, Accounting, etc to ever get to this level of agent-based automation is if the base product itself is completely 100% structured text IO based, with the human-operator interface built on top of that.

atupem•1m ago
I respectfully disagree! There's a lot of opportunity behind keyboard + mouse + screen.

In a way Bytebot is a maximalist bet on the growth and improvement of multi-modal LLMs. I firmly believe that in a short period of time, the token cost will drop, while the capability increases (both dramatically). It's still uncertain, which makes it a great asymmetric bet.

We don't do any sort grounding or image conversion, and we offer a handful of tools. I'll go into more detail in my next post.

latexr•5h ago
> showing flashes of brilliance

A “flash” of anything is also called a fluke, or a coincidence. The dumbest moron can have a flash of brilliance on occasion. So could a random word masher. Consistency is what matters.

> and we're gaining more and more conviction that this is the right form factor

Are we? Who’s “we”? Because it looks to me like the LLM approach is lacklustre if you care about truth and correctness (which you should) but the people and companies invested don’t really have a better idea and are shoving them down everyone’s throats in pursuit of personal profit.

atupem•17m ago
Agreed, and the consistency has improved over time. I remember only a 9 months ago struggling to get a browser agent to accurately click on a checkbox. The growth trajectory is what has us excited.

"We" are a YC-backed startup: https://www.ycombinator.com/companies/bytebot.

Re: truth and correctness, their are different tolerances depending on the type of task.

teruakohatu•3h ago
What is your business model?
atupem•13m ago
We're working with design partners as forward deployed engineers, helping setup Bytebot on their infra and tackle use cases.

We'll be launching a self-serve cloud platform soon!

lelanthran•2h ago
See my comprehensive reply downthread (it's very long, you cannot miss it).

While I am skeptical due to already having explored this for SMME Line of Business applications, I wish you all the best of luck.

My approach is to simply build a new system from the ground up that can take advantage of structured IO.

[EDIT: send me a message with a link to a post about your product (or this blog), I'll connect with you on linked-in and share your post with my network, meager though it may be]

atupem•1m ago
Will do!
clbrmbr•1h ago
Does anyone have experience getting agents to understand terminal applications? Like, in general an arbitrary ncurses application.

A more specific case I’ve struggled with is output from a long-running program like ping. You’ve got to know when to terminate.

Get the location of the ISS using DNS

https://shkspr.mobi/blog/2025/07/get-the-location-of-the-iss-using-dns/
76•8organicbits•2h ago•36 comments

Overthinking GIS (2024)

https://scottsexton.co/post/overthinking_gis/
70•todsacerdoti•6h ago•17 comments

Hidden interface controls that affect usability

https://interactions.acm.org/archive/view/july-august-2025/stop-hiding-my-controls-hidden-interface-controls-are-affecting-usability
492•cxr•15h ago•316 comments

Local-first software (2019)

https://www.inkandswitch.com/essay/local-first/
752•gasull•1d ago•250 comments

Serving 200M requests per day with a CGI-bin

https://simonwillison.net/2025/Jul/5/cgi-bin-performance/
217•mustache_kimono•14h ago•154 comments

Take Two: Eshell

http://yummymelon.com/devnull/take-two-eshell.html
61•nanna•3d ago•33 comments

Stop killing games and the industry response

https://blog.kronis.dev/blog/stop-killing-games
6•LorenDB•2h ago•0 comments

Eastern Baltic cod grow much smaller than they did due to overfishing

https://www.smithsonianmag.com/smart-news/these-cod-have-been-shrinking-dramatically-for-decades-now-scientists-say-theyve-solved-the-mystery-180986920/
229•littlexsparkee•19h ago•71 comments

Toys/Lag: Jerk Monitor

https://nothing.pcarrier.com/posts/lag/
3•ptramo•13m ago•1 comments

The War on the Walkman

https://newsletter.pessimistsarchive.org/p/the-forgotten-war-on-the-walkman
16•mfiguiere•3d ago•3 comments

July 5, 1687: When Newton explained why you don't float away

https://multiverseemployeehandbook.com/blog/when-newton-explained-why-you-dont-float-away/
70•TMEHpodcast•10h ago•63 comments

What a Hacker Stole from Me

https://mynoise.net/blog.php
236•wonger_•16h ago•63 comments

Two and a Half Years in GameDev

https://smyachenkov.com/posts/two-and-half-years-in-gamedev/
18•_sJiff•1h ago•0 comments

Can we test it? Yes, was can [video]

https://www.youtube.com/watch?v=MqC3tudPH6w
28•zdw•3d ago•35 comments

The Mystery of People Who Speak Languages

https://www.newyorker.com/magazine/2018/09/03/the-mystery-of-people-who-speak-dozens-of-languages
20•rbanffy•3d ago•6 comments

How to Network as an Introvert

https://aginfer.bearblog.dev/how-to-network-as-an-introvert/
252•agcat•17h ago•93 comments

Development of a transputer ISA board

https://nanochess.org/transputer_board.html
44•nanochess•2d ago•3 comments

Reinforcement Learning from Human Feedback (RLHF) in Notebooks

https://github.com/ash80/RLHF_in_notebooks
4•ash_at_hny•23m ago•0 comments

Show HN: I made Logic gates using CSS if() function

https://yongsk0066.github.io/css_if_logic_gate/
47•yongsk0066•3d ago•9 comments

Europe's first geostationary sounder satellite is launched

https://www.eumetsat.int/europes-first-geostationary-sounder-satellite-launched
201•diggan•1d ago•43 comments

Volvo delivers 5,000th electric semi

https://electrek.co/2025/06/29/volvo-delivers-5000th-electric-semi-with-little-fanfare-sending-a-big-message/
219•JumpCrisscross•12h ago•139 comments

macOS Icon History

https://basicappleguy.com/basicappleblog/macos-icon-history
215•ksec•23h ago•83 comments

Optimizing Tool Selection for LLM Workflows with Differentiable Programming

https://viksit.substack.com/p/optimizing-tool-selection-for-llm
107•viksit•17h ago•34 comments

Speeding up PostgreSQL dump/restore snapshots

https://xata.io/blog/behind-the-scenes-speeding-up-pgstream-snapshots-for-postgresql
132•tudorg•22h ago•35 comments

The force-feeding of AI features on an unwilling public

https://www.honest-broker.com/p/the-force-feeding-of-ai-on-an-unwilling
183•imartin2k•8h ago•170 comments

ClojureScript from First Principles [video]

https://www.youtube.com/watch?v=An-ImWVppNQ
88•puredanger•3d ago•23 comments

Yet Another Zip Trick

https://hackarcana.com/article/yet-another-zip-trick
78•todsacerdoti•4d ago•24 comments

Show HN: BreakerMachines – Modern Circuit Breaker for Rails with Async Support

https://github.com/seuros/breaker_machines
11•seuros•4h ago•2 comments

Techno-feudalism and the rise of AGI: A future without economic rights?

https://arxiv.org/abs/2503.14283
188•lexandstuff•17h ago•153 comments

On latency, measurement, and optimization in algorithmic trading systems

https://www.architect.co/posts/how-fast-is-it-really
44•auc•3d ago•20 comments