frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

https://github.com/localgpt-app/localgpt
2•yi_wang•45m ago•0 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
59•momciloo•8h ago•11 comments

Show HN: A luma dependent chroma compression algorithm (image compression)

https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-...
31•mbitsnbites•3d ago•2 comments

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

https://puemos.github.io/craftplan/
7•deofoo•4d ago•1 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
295•isitcontent•1d ago•39 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
44•sandGorgon•2d ago•21 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
363•eljojo•1d ago•218 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
374•vecti•1d ago•172 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
2•cui•2h ago•1 comments

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

https://github.com/writerslogic/witnessd
2•davidcondrey•3h ago•1 comments

Show HN: More beautiful and usable Hacker News

https://twitter.com/shivamhwp/status/2020125417995436090
3•shivamhwp•39m ago•0 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
97•antves•2d ago•70 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
86•phreda4•1d ago•17 comments

Show HN: PalettePoint – AI color palette generator from text or images

https://palettepoint.com
2•latentio•5h ago•0 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
156•bsgeraci•1d ago•65 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
29•dchu17•1d ago•12 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
55•nwparker•2d ago•12 comments

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent
2•shubham-coder•7h ago•1 comments

Show HN: Stacky – certain block game clone

https://www.susmel.com/stacky/
3•Keyframe•8h ago•0 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
23•NathanFlurry•1d ago•11 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
18•denuoweb•2d ago•2 comments

Show HN: A toy compiler I built in high school (runs in browser)

https://vire-lang.web.app
3•xeouz•8h ago•1 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
173•vkazanov•2d ago•49 comments

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
2•ivanglpz•10h ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
3•anipaleja•10h ago•0 comments

Show HN: MCP App to play backgammon with your LLM

https://github.com/sam-mfb/backgammon-mcp
3•sam256•12h ago•1 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
27•JoshPurtell•2d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•1d ago•8 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
9•sakanakana00•13h ago•2 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•13h ago•1 comments
Open in hackernews

Show HN: Playwright Skill for Claude Code – Less context than playwright-MCP

https://github.com/lackeyjb/playwright-skill
189•syntax-sherlock•3mo ago
I got tired of playwright-mcp eating through Claude's 200K token limit, so I built this using the new Claude Skills system. Built it with Claude Code itself.

Instead of sending accessibility tree snapshots on every action, Claude just writes Playwright code and runs it. You get back screenshots and console output. That's it.

314 lines of instructions vs a persistent MCP server. Full API docs only load if Claude needs them.

Same browser automation, way less overhead. Works as a Claude Code plugin or manual install.

Token limit issue: https://github.com/microsoft/playwright-mcp/issues/889

Claude Skills docs: https://docs.claude.com/en/docs/claude-code/skills

Comments

wahnfrieden•3mo ago
Why not just ask the agent to use Playwright via CLI? That’s what I do and it works fine. With Codex anyway

Edit: oops that’s what you did too. Yes most MCP shouldn’t be used.

kylemh•3mo ago
but how would claude then look at devtools for the playwright window to see console output? i know some frameworks are putting logging into the shell, but in old repos Playwright MCP seems worth the extra context, no?
wild_egg•3mo ago
This was on my TODO list for the week, thanks for sharing!

Now I just need to make a skill for using Jira and I can go back to the MCP-free life.

syntax-sherlock•3mo ago
thanks!
AftHurrahWinch•3mo ago
MCPs are deterministic, SKILLS.md isn't. Also run.js can run arbitrarily generated Node.js code. It is a trivial vector for command injection.

This might be sufficient for an independent contractor or student. It shouldn't be used in a production agent.

syntax-sherlock•3mo ago
Yeah, this isn’t meant to replace your real tests it’s more for quick “does my new feature work?” checks during local dev. Think of it like scriptable manual testing: Claude spits out the Playwright code faster than you would, but it’s not CI-level coverage.

And for privacy screenshots stay local in /tmp, but console output and page content do go to Claude/Anthropic. It’s designed for dev environments with dummy data, not prod. Same deal as using Claude for any coding help.

pacoWebConsult•3mo ago
If you're going to use claude to help you respond to feedback the least you can do is restate this in your own words. Parent commenter deserves the respect of corresponding with a real human being.
blks•3mo ago
You don’t understand what you’re talking about enough so you have to ask llm to generate a response for you?
bravura•3mo ago
LLMs are not deterministic though. So by definition MCPs are not deterministic.

For example, GPT-5 doesn't support temperature parameter. And even models that do support temperature are not deterministic with temperature=0.

siva7•3mo ago
MCPs aren't deterministic...
AftHurrahWinch•3mo ago
Yes, they are. Like any standard API, it is an orchestration layer that, given a specific input should always execute the same logic and produce a consistent output. Its job is deterministic execution.

The agentic system that uses MCP (e.g., an LLM) is fundamentally non-deterministic. The LLM's decision of which tool to call, when to call it, and what to do with the response is stochastic.

dragonwriter•3mo ago
> MCPs are deterministic, SKILLS.md isn't.

MCPs themselves may provide access to tools that are either deterministic or not, but the LLM using them generally isn't deterministic, so when used by an LLM as part of the request-response cycle determinism, if the MCP-provided tool had it, is not in a feature of the overall system.

SKILLS.md relies on a deterministic code execution environment, but has the same issue. I'm not seeing a broad difference in kind here when used in the context of an LLM response generation cycle, and that’s really the only context where both are usable (MCP could be used for non-LLM integration, but that doesn't seem relevant.)

rapatel0•3mo ago
I think that this is actually the biggest threat to the current "AI bubble." Model efficiency and diffusion of models to open source. It's probably to start hedging bets on Nvidia
philipallstar•3mo ago
Why would OSS models threaten Nvidia?
ISV_Damocles•3mo ago
Most of the big OSS AI codebases (LLM and Diffusion, at least) have code to work on any GPU, not just nVidia GPUs, now. There's a slight performance benefit to sticking with nVidia, but once you need to split work across multiple GPUs, you can do a cost-benefit analysis and decide that, say, 12 AMD GPUs is faster than 8 nVidia GPUs and cheaper, as well.

Then nVidia's moat begins to shrink because they need to offer their GPUs at a somewhat reduced price to try to keep their majority share.

lmeyerov•3mo ago
Share can go up and down if consumption keeps going up crazily. We now spend more per dev on their personal use inferencing providers than their home devices, so inferencing chips are effectively their new personal computers...
epolanski•3mo ago
> There's a slight performance benefit to sticking with nVidia

In training, not in inference and not in perf/$.

Rooster61•3mo ago
I have a few questions about test frameworks that use AI services like this.

1)The examples always seem very generic: "Test Login Functionality, check if search works, etc". Do these actually work well at all once you step outside of the basic smoketest use cases?

2) How to you prevent proprietary data from being read when you are just foisting snapshots over to the AI provider? There's no way I'd be able to use this in any kind of real application where data privacy is a constraint.

syntax-sherlock•3mo ago
Good questions!

1) Beyond basic tests: You're right to be skeptical. This is best for quick exploratory testing during local development ("does my new feature work?"), not replacing your test suite. Think "scriptable manual testing" - faster than writing Playwright manually, but not suitable for comprehensive CI/CD test coverage.

2) Data privacy: Screenshots stay local in /tmp, but console output and page content Claude writes tests against are sent to Anthropic. This is a local dev tool for testing with dummy data, not for production environments with real user data. Same privacy model as any AI coding assistant - if you wouldn't show Claude your production database, don't test against it with this.

Rooster61•3mo ago
Thanks. I keep seeing silver bullet testing solutions pitched left right and center and wondering about these two points. Glad to see a project with realistic boundaries and expectations set. Would definitely give this a shot if I was working on a local project.
blks•3mo ago
Another llm-generated response. This is sad.
jcheng•3mo ago
For 2, a lot of companies use AWS Bedrock to access Claude models instead of Anthropic, for exactly this reason. Amazon’s terms say they don’t log prompts or completions and don’t send anything to the model provider. If your production database is already hosted by AWS, it doesn’t seem like much additional risk.
siva7•3mo ago
> Do these actually work well at all once you step outside of the basic smoketest use cases?

Excellent question... no, beyond basic kindergarten stuff playwright (with AI) falls quickly apart. Have some OAuth? Good luck configuring playwright for your exact setup. Need to synthesize all information available from logs and visuals to debug something? Good luck..

simonw•3mo ago
I'm using Playwright so much right now. All of the good LLMs appear to know the API really well.

Using Claude Code I'll often prompt something like this:

"Start a python -m http.server on port 8003 and then use Playwright Python to exercise this UI, there's a console error when you click the button, click it and then read that error and then fix it and demonstrate the fix"

This works really well even without adding an extra skill.

I think one of the hardest parts of skill development is figuring out what to put in the skill that produces better results than the model acting alone.

Have you tried iteratively testing the skill - building it up part by part and testing along the way to see if the different sections genuinely help improve the model's performance?

syntax-sherlock•3mo ago
Yeah you can definitely do this with prompts since LLMs know the API really well. I just got tired of retyping the same instructions and wanted to try out the new Skills.

I did test by comparing transcripts across sessions to refine the workflow. As I'm running into new things I'm continuing to do that.

yomismoaqui•3mo ago
One thing that I see skills having the advantage is when they include scripts for specific tasks that the LLM has a difficult time generating the right code.

Also the problem with the LLM being trained to use foo tool 1.0 and now foo tool is on version 2.0.

The nice thing is that scripts on a skill are not included in the context and also they are deterministic.

rgbrgb•3mo ago
this is the core problem rn with developing anything that uses an LLM. It’s hard to evaluate how well it works and nearly impossible to evaluate how well it generalizes unless the input is constrained so tightly that you might as well not use the LLM. For this I’d probably write a bunch of test tasks and see how well it performs with and without the skill. But the tough part here is that in certain codebases it might not need the skill. The whole environment is an implicit input for coding agents. In my main codebase right now there are tons of playwright specs that Claude does a great job copying / improving without any special information.

edit with one more thought: In many ways this mirrors building/adopting dev tooling to help your (human) junior engineers, and that still feels like the good metaphor for working with coding agents. It's extremely context dependent and murky to evaluate whether a new tool is effective -- you usually just have to try it out.

verdverm•3mo ago
Also, if you figure out a good prompt today you don't know how long it will last, because of model updates outside your control
silveraxe93•3mo ago
I'm surprised Anthropic didn't release skills with a `skill-creation` skill.
submeta•3mo ago
But they did.
silveraxe93•3mo ago
Did they!? Damn I missed it.

I was looking into creating one and skimmed the available ones and didn't see it.

EDIT:

Just looked again. In the docs they have this section: ``` Available Skills

Pre-built Agent Skills The following pre-built Agent Skills are available for immediate use:

    PowerPoint (pptx): Create presentations, edit slides, analyze presentation content
    Excel (xlsx): Create spreadsheets, analyze data, generate reports with charts
    Word (docx): Create documents, edit content, format text
    PDF (pdf): Generate formatted PDF documents and reports
These Skills are available on the Claude API and claude.ai. See the quickstart tutorial to start using them in the API. ```

Is there another list of available skills?

simonw•3mo ago
Their repo here: https://github.com/anthropics/skills

This is the skill creation one: https://github.com/anthropics/skills/blob/main/skill-creator...

You can turn on additional skills in the Claude UI from this page: https://claude.ai/settings/capabilities

silveraxe93•3mo ago
Nice, thanks!
onion2k•3mo ago
there's a console error when you click the button

Chrome Devtools also has an MCP server that you can connect an LLM to, and it's really good for debugging frontend issues like that.

boredtofears•3mo ago
I get so many LLM death spirals with playwright.

When it works, its totally magic, but I find it gets hung up on things like not finding the active playwright window or being able to identify elements on the screen.

yomismoaqui•3mo ago
Thanks, I have installed it and it works great!

Related anecdote: some months ago I tried to coax the Playwright MCP to do a full page screenshot and it couldn't do it. Then I just told Claude Code to write a Playwright JS script to do that and it worked at the first try.

Taking into account all the tools crap that the Playwright MCP puts in your context window and the final result I think this is the way to go.

jonnyparris•3mo ago
How does this compare to using chrome-devtools mcp? I've been using that for validating UI flows and I haven't hit limits with Claude yet
singularity2001•3mo ago
I am using Tmux lynx browser MCP for sites that don't need JavaScript
mahdiyar•3mo ago
I have created a simple .sh command to do the testing using browser-use and add how to prompt this command in CLAUDE.md because it has to run a shell command, it is more deterministic than Skills. And it uses near to zero of context window compared to MCP.

Recently, I have found myself getting more interested in shell commands than MCPs. There is no need to set it up. Debugging is far easier. And I would be free to use whichever model I like ot use for a specific function. For example, for Playwright, I use GPT-5 just because I have free credits. I could save my Claude Code Quota for more important tasks.

nikisweeting•3mo ago
Heh BrowserBase and Browser-Use exist specifically because this is a harder problem than it looks. Any approach will work for the first couple actions, that hard parts are long strings of actions that depend on the results of previous actions, compressing the context and knowing what to send, and having your tools work across all the edge cases (e.g. date picker fields, file upload fields, cross origin iframes, etc.).
didibus•3mo ago
How do images affect context? Does an image run separately on another model and returns a text description of it that ends up smaller than the accessibility text tree?
guluarte•3mo ago
i just use https://github.com/ravitemer/mcphub.nvim and enable/disable the mcps i want
cadamsdotcom•3mo ago
This is brilliant.

If you're willing to have Claude write code to test a thing you could do a teeny bit more work and make that Playwright script a permanent part of your codebase. Then, the script can run in your CI on every build, and you can keep enhancing it as your product changes so it keeps proving that area of your product works as desired.

Have it run inside a harness that spins up its own server & stack (DB etc.) and boom - you now have an end-to-end test suite!

gvkhna•3mo ago
Working on this problem but a combo of a skill and an mcp better suited to playwright is the solution IMO.

The issue is for many things playwright is really verbose, by better tailoring outputs and making them more fine grained you’ll get less context bloat and allow the llm to better work with the context. I’m making it open source.

coopykins•3mo ago
How does this compare to using the recently released chrome mcp?