frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
221•isitcontent•13h ago•25 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
323•vecti•15h ago•142 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
275•eljojo•15h ago•161 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•48m ago•1 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
70•phreda4•12h ago•14 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
90•antves•1d ago•66 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
16•denuoweb•1d ago•2 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
10•michaelchicory•2h ago•1 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
47•nwparker•1d ago•11 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
150•bsgeraci•1d ago•63 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
17•NathanFlurry•21h ago•7 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
8•keepamovin•3h ago•2 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•5h ago•0 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•18h ago•7 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•6h ago•4 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
2•rs545837•7h ago•1 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
172•vkazanov•2d ago•49 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
25•dchu17•17h ago•12 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•12h ago•1 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•9h ago•1 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

https://github.com/shadowy-pycoder/go-http-proxy-to-socks
2•shadowy-pycoder•9h ago•0 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•10h ago•0 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
11•KevinChasse•18h ago•16 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•10h ago•0 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•18h ago•0 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•11h ago•0 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•12h ago•1 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments
Open in hackernews

Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks

66•adilhafeez•7mo ago
Hi HN — we're the team behind Arch (https://github.com/katanemo/archgw), an open-source proxy for LLMs written in Rust. Today we're releasing Arch-Router (https://huggingface.co/katanemo/Arch-Router-1.5B), a 1.5B router model for preference-based routing, now integrated into the proxy. As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps:

- Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers.

- Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often miss what matters in production: domain-specific quality or subjective preferences like “Will legal accept this clause?”

Arch-Router takes a different approach: route by preferences written in plain language. You write rules like “contract clauses → GPT-4o” or “quick travel tips → Gemini Flash.” The router maps the prompt (and conversation context) to those rules using a lightweight 1.5B autoregressive model. No retraining, no fragile if/else chains. We built this with input from teams at Twilio and Atlassian. It handles intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. Full details are in our paper (https://arxiv.org/abs/2506.16655), but here's a snapshot:

Specs:

- 1.5B params — runs on a single GPU (or CPU for testing)

- No retraining needed — point it at any mix of LLMs

- Cost and latency aware — route heavy tasks to expensive models, light tasks to faster/cheaper ones

- Outperforms larger closed models on our conversational routing benchmarks (details in the paper)

Links:

- Arch Proxy (open source): https://github.com/katanemo/archgw

- Model + code: https://huggingface.co/katanemo/Arch-Router-1.5B

- Paper: https://arxiv.org/abs/2506.16655

Comments

sparacha•7mo ago
Hi HN! I am one of the co-authors of the paper. If there are any questions about our approach, I would love to answer them.
tmaly•7mo ago
do you think it would be possible to quantize this model and still get good results?
sparacha•7mo ago
yes - we have already published a quantized version here: https://huggingface.co/katanemo/Arch-Router-1.5B.gguf. The performance difference with a quant version is negligible. I'll run another analysis and update the thread shortly
sparacha•7mo ago
Overall performance degrades from 93.17 -> 92.99 with a quantized version
jedisct1•7mo ago
I tried to use it to rate the difficulty level of coding tasks (for InferSwitch, an LLM router), but it performed far worse than Qwen2.5-Coder-7B (but sure, 1.5B vs 7B)
sparacha•7mo ago
Can you share more about your evaluation setup? I would love to see the specific usage pattern as we have tested our model against smaller LLMs and foundational models and our results show things differently. Of course, routing policies should follow best practices here: https://docs.archgw.com/guides/llm_router.html

Nonetheless, super curious to learn more and see what we may be able to improve. This is technically not a classifier model - its a usage prediction model (feels like a classifier, but not quite in terms of intended usage)

cotran2•7mo ago
According to the post, the model is fine-tuned for routing to different tasks/domains. Classifying difficulty level is probably not the intended use case.
jgant13•7mo ago
Solid. Can you show us when to use this vs. say OpenRouter? The performance seems strong for sure. TIA.
sparacha•7mo ago
Arch is developer friendly, but designed for enterprise-grade customers in mind. The core contributors of Envoy redesigned the proxy substrate to handle prompts - offering something that is battle tested in terms of resiliency, speed, and deployments. Second, OpenRouter offers choice of models, but dynamically routing to LLMs based on user-defined usage policies is uniquely available in Arch. Hope that helps
_nh_•7mo ago
How do you compare with RouteLLM?
sparacha•7mo ago
RouteLLM is essentially a benchmark-driven approach. Their framework chooses between a weak and a strong model and helps developers optimize for a metric called APGR (Average Performance Gap Recovered) — a measure of how much of the stronger model’s performance can be recovered when routing some queries to the weaker, cheaper model. However, their routing models are trained to maximize performance on public benchmarks like MMLU, BBH, or MT-Bench. These benchmarks may not capture subjective, domain-specific quality signals that surface in practice.

Arch-Router takes a different approach. Instead of focusing benchmark scores, we lets developers define routing policies in plain language based on their preferences — like “contract analysis → GPT-4o” or “lightweight brainstorming → Gemini Flash.” Our 1.5B model learns to map prompts (along with conversational context) to these policies, enabling routing decisions that align with real-world expectations, not abstract leaderboards. Also our approach doesn't require router model retraining when new LLMs are swapped in or when preferences change.

Hope this helps.

cotran2•7mo ago
There is a case study comparing with RouteLLM in the appendix.
pseudosavant•7mo ago
Not that LLMs are terribly latency sensitive (you wait on a lot of tokens), but what kind of latency impact does this have on requests that go through the proxy?
cotran2•7mo ago
The model is compact 1.5B, most GPUs can serve it locally and has <100ms e2e latency. For L40s, its 50ms.
adilhafeez•7mo ago
Short answer is latency impact is very minimal.

We use envoy as request handler which forwards request to local service written in rust. Envoy is proven to be high performance, low latency and highly efficient on request handling. If I have to put a number it would be in single digit ms per request. I will have more detailed benchmark in the coming days.