news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

BYO – A multi-agent runtime optimized for parallel inference

https://byo-x.ai/marketplace

1•Yarden_Bruch_El•23m ago

Comments

Yarden_Bruch_El•23m ago

Hi HN, We built a platform for orchestrating multi-agent debates (e.g., "Security" vs. "Refactoring" experts). The Challenge: Standard sequential agent chains (A -> B -> C) are too slow for real-time chat. The Fix (vLLM): We built a custom inference layer on top of vLLM to solve the bottleneck: Parallelism: We use continuous batching to generate multiple agent responses simultaneously rather than waiting for sequential turns. Memory: PagedAttention allows our agents to share the KV cache for the common context/system prompts, drastically reducing VRAM usage. We’d love feedback on the responsiveness. Create an expert, start a debate, and let us know if the parallel inference makes the conversation feel fluid enough.

Shahaf_Wieder•6m ago

You're burying the lede: SOTA 'Reasoning Models' (o1/GPT-4) are actually unusable for agent swarms because inference latency kills the recursion loop.

The real alpha here is Parallel Consensus. Running 5 Llama-3 instances via vLLM to critique each other at <200ms TTFT (Time To First Token) beats a single, slow GPT-4 wrapper every time.

Error correction belongs in the orchestration layer, not the model weights. Is the 'One Giant Model' era finally over for agents?

Yarden_Bruch_El•3m ago

Spot on. We found that ensembles of small models often beat a single large model.

The catch is VRAM. You can't run parallel swarms efficiently without PagedAttention. We rely on vLLM to share the KV cache for the system prompt—otherwise, spinning up 5 agents for a consensus vote would instantly OOM the GPU.

Academic Arbitrage in the LLM Era

https://c.mov/academic-arbitrage/

1•aoli-al•1m ago•0 comments

Raylib now requires no OS

https://twitter.com/raysan5/status/1992964737953411283

1•klaussilveira•2m ago•0 comments

Show HN: Grow Your Organic Traffic with Automated SEO

https://www.fastseofix.com

1•certibee•2m ago•0 comments

Ask HN: Do smooth, wearable spinning rings (fidget-spinners) exist?

2•spinity•3m ago•0 comments

Real-time, verified city chatrooms for travelers built around actual trip dates

1•aacishh•3m ago•0 comments

I tried lab-grown chocolate. Could it be the future of Halloween?

https://www.theguardian.com/wellness/2025/oct/31/lab-grown-chocolate-halloween

1•PaulHoule•4m ago•0 comments

Ethiopian volcano erupts for first time in 12,000 years

https://www.theguardian.com/world/2025/nov/24/ethiopian-volcano-hayli-gubbi-erupts-first-time-120...

1•tosh•4m ago•0 comments

Shard Your Database

https://pgdog.dev/blog/shard-your-database

1•levkk•5m ago•0 comments

Show HN: MCP Optimizer for faster, lower-token coding agents

https://platform.tupl.xyz/

1•fencio_dev•6m ago•0 comments

New CPU Pricing for Containers and Sandboxes – Cloudflare

https://developers.cloudflare.com/changelog/2025-11-21-new-cpu-pricing/

1•NicoJuicy•7m ago•0 comments

Show HN: Pg-aiguide – Write better PostgreSQL code with AI

https://github.com/timescale/pg-aiguide

1•cevian•7m ago•0 comments

Enshittification of Arduino Begins? Qualcomm Starts Clamping Down

https://itsfoss.com/news/enshittification-of-arduino-begins/

1•cratermoon•7m ago•0 comments

Automatically Merging Dependabot Pull Requests

https://deanpcmad.com/2025/dependabot-auto-pr-merge/

1•deanpcmad•8m ago•0 comments

Network Design Principles

https://github.com/xxia8864/Article/blob/main/Docs/Network%20Design%20Principles.md

1•bill3389•9m ago•0 comments

The Intentional Stance

https://en.wikipedia.org/wiki/Intentional_stance

1•georgestrakhov•11m ago•0 comments

Shai-Hulud malware infects 500 NPM packages, leaks secrets on GitHub

https://www.bleepingcomputer.com/news/security/shai-hulud-malware-infects-500-npm-packages-leaks-...

2•speckx•11m ago•0 comments

Olmo 3 from Scratch (Standalone Notebook)

https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/13_olmo3/README.md

1•quietlearning•12m ago•0 comments

Surf (Dutch edu/research IT cooperative) test Nextcloud for partner institutions

https://www.surf.nl/en/themes/public-values/surf-and-nextcloud

4•teekert•12m ago•0 comments

Assert in Production

https://dtornow.substack.com/p/assert-in-production

1•rajeevk•15m ago•0 comments

WINS removal: Moving forward with modern name resolution

https://support.microsoft.com/en-us/topic/wins-removal-moving-forward-with-modern-name-resolution...

1•p_ing•15m ago•0 comments

A Startup's Bid to Dim the Sun: Solar Geoengineering Pros and Cons

https://www.newyorker.com/news/the-lede/a-startups-bid-to-dim-the-sun

1•bookofjoe•15m ago•1 comments

Cable Caballero

https://www.notboring.co/p/cable-caballero

2•pepelondono•15m ago•0 comments

Show HN: CyteType – AI agents that annotate cell types in scRNA-seq data

https://github.com/NygenAnalytics/CyteType

1•parashar_nygen•15m ago•0 comments

Radical Ethics

https://studium.dev/tech/radical-ethics

2•jerlendds•16m ago•0 comments

GCP charged $1.3k for stdout logs (9x my cluster cost). Refund denied twice

https://imgur.com/jGrxnkh

1•nthypes•16m ago•1 comments

Declassified cable reinforces proliferation concerns about uranium fuel

https://thebulletin.org/2025/11/declassified-cable-reinforces-proliferation-concerns-about-high-a...

2•robtherobber•17m ago•0 comments

TOON vs. JSON: A Mathematical Evaluation of Byte Efficiency in Structured Data

https://www.researchgate.net/publication/397903673_TOON_vs_JSON_A_Mathematical_Evaluation_of_Byte...

1•lafalce•17m ago•0 comments

Tell HN: Cursor charged 19 subscriptions, won't refund

2•devtailz•18m ago•0 comments

Show HN: Ferromagnetic Producer

https://headprocess.com/software/2025/11/21/ferromagnetic-producer.html

1•endanke•19m ago•0 comments

Index Your To-Go Cup Lid

https://paul.af/index-your-lid

2•pinjasaur•19m ago•0 comments