frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
156•isitcontent•7h ago•16 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
259•vecti•9h ago•122 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
202•eljojo•10h ago•133 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
52•phreda4•6h ago•9 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
78•antves•1d ago•59 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
41•nwparker•1d ago•11 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
147•bsgeraci•1d ago•61 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
12•NathanFlurry•15h ago•5 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
3•AGDNoob•3h ago•1 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•6h ago•1 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•4h ago•0 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
13•toborrm9•12h ago•5 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•4h ago•0 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
23•dchu17•11h ago•11 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•6h ago•0 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
171•vkazanov•1d ago•48 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•6h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
10•KevinChasse•12h ago•9 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
8•sawyerjhood•12h ago•0 comments

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

https://github.com/SpOpsi/Project-Baver
2•solarV26•10h ago•0 comments

Show HN: Agentism – Agentic Religion for Clawbots

https://www.agentism.church
2•uncanny_guzus•10h ago•0 comments

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

https://github.com/BansheeTech/Disavow-Generator
5•SurceBeats•16h ago•1 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

https://github.com/choihimchan/bpu-stream-engine
2•octablock•12h ago•0 comments

Show HN: Total Recall – write-gated memory for Claude Code

https://github.com/davegoldblatt/total-recall
10•davegoldblatt•1d ago•6 comments

Show HN: Hibana – An Affine MPST Runtime for Rust

https://hibanaworks.dev
3•o8vm•13h ago•0 comments

Show HN: Beam – Terminal Organizer for macOS

https://getbeam.dev/
2•faalbane•13h ago•2 comments

Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

https://github.com/bethington/ghidra-mcp
294•xerzes•2d ago•66 comments
Open in hackernews

Show HN: Open-source implementation of Stanford's self-learning agent framework

https://github.com/kayba-ai/agentic-context-engine
10•kayba•3mo ago
We implemented Stanford's Agentic Context Engineering paper which shows agents can improve their performance just by evolving their own context.

How it works: Agents execute tasks, reflect on what worked/failed, and curate a "playbook" of strategies. All from execution feedback - no training data needed.

Happy to answer questions about the implementation or the research!

Comments

vebgen•3mo ago
This is fascinating! The "evolving playbook" approach resonates with challenges we've been tackling building an AI agent for Django development.

A few questions about your implementation:

1. How do you handle the balance between delta updates and full context rewrites when the playbook grows large? We've found that keeping detailed history helps with debugging but can bloat context quickly.

2. The Generator/Reflector/Curator separation is elegant. Did you implement these as separate LLM calls or different prompting strategies on the same model? We use a similar dual-agent pattern (planner + executor) and the coordination overhead is non-trivial.

3. Most interesting part: "natural execution feedback without labeled supervision." How do you define success/failure signals for the Reflector in ambiguous cases? For code generation, it's easy (tests pass/fail), but for other domains it seems trickier.

The +10.6% improvement on agent tasks is impressive - definitely checking out the paper. The brevity bias problem you mention is real - we've noticed agents dropping important context details when trying to "summarize efficiently."

kayba•3mo ago
Thanks for the great questions! Here's how we're tackling these:

1. Context growth management:

We avoid full context rewrites entirely, they cause context collapse where the LLM compresses away important details. Instead, we use delta updates as the foundation and are exploring:

- Semantic de-duplication to remove redundancy - Keeping deltas as the source of truth with optional summarization layers on top - Pre-filtering the playbook to feed the model a more focused version, with tooling to let it explore further when needed

Delta updates remain our core principle, but we're actively working on preventing context bloat as playbooks scale.

2. Role separation:

Our library lets you select different models for each role, with prompts specifically tailored to each function. So far we've mostly used the same model for all three roles, but we're actively exploring model mixing as a promising direction.

3. Success signals:

The system shows strong self-assessment capabilities using execution feedback (code pass/fail, API responses, and model interactions with the environment). However, you're right that ambiguous domains are trickier, this is still an open challenge for us. Our vision is to pre-seed domain knowledge through curated playbooks or training samples, then let models self-explore and discover their own success patterns over time.

What I'm curious about:

- What feedback signals work for your Django agent?

- How do you handle planner-executor coordination overhead?

- Have you hit similar brevity bias issues?

Would love to continue this conversation on Discord if you're interested: https://discord.com/invite/mqCqH7sTyK

jimmySixDOF•3mo ago
this kind of DSpy-GEPA self improvement loop keeps popping up and adding a few points but the cost (API and wall clock)also means you use this where a repeatable task/prompt/context needs optimizing and you can afford to find better templates
kayba•3mo ago
You're right that cost and latency are important considerations. However, the research shows this isn't just about finding better templates, it's about enabling agentic systems to learn and improve from their previous attempts and failures.

We believe in-context learning is one of the missing pieces to make agentic systems feasible in production. The key is that systems can adapt without expensive fine-tuning or retraining. The paper shows *86.9% lower adaptation latency* and significant reductions in rollout costs compared to existing methods, making this approach more practical than previous optimization techniques.

The real value is in systems that progressively get better at their tasks through experience, not just one-time prompt optimization.

If you want to continue this conversation just hit me up on Discord: https://discord.com/invite/mqCqH7sTyK

jimmySixDOF•3mo ago
I did look into DataRobot's Syftr which points at the same problem but is a lot heavier I definitely like that the approach you take is at least easy to get a basic version up and can start checking the results right away!