Show HN: Bonsai –- Using agentic AI / browser / memory to replace ChatGPT

https://drive.google.com/drive/folders/1YUQ3tmcBSLEyBKLi5JdJgmod9mqXFTgl

2•coolwulf•1h ago

Comments

coolwulf•4m ago

Bonsai: A Local Agentic AI Harness Built Around Small Models Since last year, I've been teaching a course at UT Southwestern Medical Center on how to build Agentic AI systems and harnesses for specialized domains.

One thing I've noticed is that as companies like OpenAI, Google, and Anthropic continue raising API prices, the cost of running frontier models in the cloud keeps increasing. At the same time, many users are using ChatGPT the same way they used Google years ago: asking questions and looking up information. Most of these use cases simply don't justify paying for GPT-5.5, Opus 4.8, or other expensive flagship models.

That led me to explore a different idea: combining efficient local models with a purpose-built harness that provides tools, memory, and domain-specific skills.

Part of the reason I named this project Bonsai is that I had some interactions with Stanford's Prism Lab. The architecture follows an Agent + Skills + Memory design. Memory is implemented locally using embeddings and SQLite, allowing semantic retrieval through cosine similarity search. This helps compensate for the limited context windows of smaller local models.

I believe this approach can make small models much more capable than their parameter count would suggest.

Although Anthropic has never publicly disclosed the exact size of Claude Sonnet, my analysis suggests it is likely a Mixture-of-Experts (MoE) model with tens of billions of active parameters and hundreds of billions of total parameters.

The active parameters determine how much computation is used during inference, while the total parameters represent the model's stored knowledge. My hypothesis is that a dense thinking model with only tens of billions of parameters can still deliver strong performance if paired with effective harness engineering, specialized tools, memory, and retrieval systems.

If that hypothesis is correct, local models could satisfy the majority of everyday ChatGPT-style use cases without requiring expensive cloud inference.

As a first step, I'm releasing an experimental version of Bonsai.

Bonsai communicates directly with a local Google Chrome instance and provides a collection of browser-oriented tools that allow a local LLM to interact with the web in an agentic fashion. The default model is Google Gemma 4B, although Qwen models can also be used.

(One reason I chose Gemma as the default is that some government agencies and schools in Texas prohibit the use of Chinese open-source models.)

Download https://drive.google.com/drive/folders/1YUQ3tmcBSLEyBKLi5JdJ...

Screenshot https://i.imgur.com/9MacuXk.png

The left side shows the chat interface, while the right side displays the agent operating the browser in real time.

The harness includes many browser-specific tools, including JavaScript injection capabilities that allow the agent to locate page elements, inspect DOM structures, click buttons, fill forms, and perform other browser interactions.

Current features include:

Browser integration

VectorDB-based semantic memory for small-context local models

Custom browser-oriented skills and tools

Local embedding + SQLite memory system

Agentic web navigation

WebRTC-based communication layer (lower-level than MCP)

The current release was compiled for Windows and requires NVIDIA CUDA.

I've also added an Apple Silicon (M-series) Mac version to the same download directory.

The default model is a 4B thinking model because agent workflows benefit significantly from high token throughput. On my test system (Windows 11 + RTX 4090), Bonsai reaches roughly 140 tokens/sec. On an M4 Mac using Metal, I see around 50 tokens/sec.

I'm curious whether others think specialized harness engineering can make small local models practical for everyday AI workflows, rather than relying exclusively on increasingly large cloud-hosted models.

RAG Without Persona Modeling Fails Patient Clinical Relevance

What happens if Japan takes in zero immigrants?

Dirk and Linus discuss AI and kernel development

Mathematicians warn of AI threats to profession as industry encroaches

AI should earn its keep: Introducing the AI Productivity Guarantee

Why I'm Joining the Board of Dreamdata

SpaceX IPO available to Fidelity customers with as little as $2k

The Weather Machine (2008)

Agentic systems for what comes next

Validity of the EJamar Game Controller for Tracking Hand Rehabilitation

Boeing and Air India Escaped Scrutiny After the AI171 Crash

AI assistant shouldn't have your passwords

Basecamp CLI and Agent Skill: Agent first, agent native

Proposal would block solar storms with orbital 'airbag'

Anthropic calls for global pause in AI development before humans lose control

"News Man Bad": A Personnel Memo from Animal, Your Editor-in-Chief

Scala: An Experiment That Changed Programming – Martin Odersky – The Marco Show

My competitors have flawed products but I can't get traction

LLM AI Chatbots are letting me down every single day

Bumblebees spontaneously solve problems – Science News [video]

Cloudflare: bots have passed human traffic online, a year faster than expected

Bumblebees show advanced problem-solving skills in new experiment

The Kyle Kingsbury Podcast Podcast – Episode 1 – Alex Dripchak

'Aren't the Organs a Silver Lining?'

Is LinkedIn Entering Its Post-Cringe Era?

Show HN: Laravel Octane Benchmark (Swoole, RoadRunner, FrankenPHP)

Unicode Fonts and Tools for X11

Jo – Secure Programming for the AI Era

Easy Writer: On Ted Geltner's Biography of Denis Johnson

In a First, Scientists Precisely Edit Human Embryo Genes