This is an integration between browser-use and Sentience SDK, where Sentience acts like Jest for AI web agents: it lets agents assert what changed on the page, instead of guessing when they’re “done”.
Most web agents today rely on screenshots or raw DOM dumps and hope the model infers layout, state, and completion correctly. That can work for demos, but it’s expensive, brittle, and very hard to validate or debug.
Sentience takes a different approach.
The goal isn't to replace vision models, but to avoid paying for them when geometric processing of page structure is sufficient for most web pages.
At each step, Sentience builds a semantic snapshot of the page that captures what matters for interaction, not pixels or raw DOM:
* interactive elements (links, buttons, inputs) * roles + normalized text * bounding boxes + relative position * dominant groups (main lists / feeds) * ordinal structure (“first”, “top”, “last”)
This snapshot is passed to the agent as structured text (≈ 0.6–1.2k tokens per step in practice), which enables:
* browser-use agents to run without screenshots by default * text-based local LLMs (e.g. Qwen 2.5 3B) to work reliably using text-only prompts * Jest-style assertions over semantic state (e.g. “main list exists”, “first item clicked”, “task complete”)
In short:
* browser-use acts * Sentience SDK asserts
That separation makes agent behavior inspectable, debuggable, and testable, instead of opaque and heuristic-driven. It also reduces reliance on vision models when they aren’t strictly necessary.
Sentience SDK includes an agent runtime with per-step and task-level assertions, so agents can explicitly verify progress rather than relying on implicit “done” signals. On assertion failure after retries, Sentience SDK will trigger a fallback to vision models.
Example (browser-use + local LLM): Multi-step agent using Qwen 2.5 3B with semantic assertions: https://github.com/SentienceAPI/browser-use/pull/6/files
Example for failed task assertion logs: https://justpaste.it/lixt2
Example for successful task run logs: https://justpaste.it/izg2y
Full write-up with design rationale, tradeoffs, and examples: https://medium.com/@rcholic/beyond-clicking-how-we-taught-ai...
Open source SDK:
Python: https://github.com/SentienceAPI/sentience-python TypeScript: https://github.com/SentienceAPI/sentience-ts browser-use integrations:
Jest-style assertions for agents: https://github.com/SentienceAPI/browser-use/pull/5
Browser-use + Local LLM (Qwen 2.5 3B) demo: https://github.com/SentienceAPI/browser-use/pull/4
Token usage comparison (semantic snapshots vs screenshots in browser-use): https://github.com/SentienceAPI/browser-use/pull/1
Happy to answer questions or share minimal examples if you’re curious how this works in practice.
ShowHN screenshots from the test runs: https://jpcdn.it/img/small/435580414b5e9bc2236ca025573f3724....
tonyww•1h ago
Agents act → Sentience verifies.
If you’ve built flaky E2E tests or agent demos that “usually work”, this is an attempt to make those workflows inspectable and testable.