frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

How to handle people dismissing io_uring as insecure?

https://github.com/axboe/liburing/discussions/1047
20•nromiun•27m ago•3 comments

“Dynamic Programming” is not referring to “computer programming”

https://www.vidarholen.net/contents/blog/?p=1172
92•r4um•2d ago•15 comments

The Daily Life of a Medieval King

https://www.medievalists.net/2025/07/medieval-king-daily-life/
27•diodorus•3d ago•3 comments

Show HN: X11 desktop widget that shows location of your network peers on a map

https://github.com/h2337/connmap
111•h2337•6h ago•46 comments

Staying cool without refrigerants: Next-generation Peltier cooling

https://news.samsung.com/global/interview-staying-cool-without-refrigerants-how-samsung-is-pioneering-next-generation-peltier-cooling
235•simonebrunozzi•10h ago•166 comments

Log by time, not by count

https://johnscolaro.xyz/blog/log-by-time-not-by-count
79•JohnScolaro•5h ago•24 comments

ESP32-Faikin: ESP32 based module to control Daikin aircon units

https://github.com/revk/ESP32-Faikin
27•todsacerdoti•3h ago•8 comments

Agents built from alloys

https://xbow.com/blog/alloy-agents/
95•summarity•6h ago•44 comments

XMLUI

https://blog.jonudell.net/2025/07/18/introducing-xmlui/
502•mpweiher•16h ago•259 comments

New colors without shooting lasers into your eyes

https://dynomight.net/colors/
349•zdw•3d ago•88 comments

Simulating hand-drawn motion with SVG filters

https://camillovisini.com/coding/simulating-hand-drawn-motion-with-svg-filters
172•camillovisini•3d ago•15 comments

Using the Matrix Cores of AMD RDNA 4 architecture GPUs

https://gpuopen.com/learn/using_matrix_core_amd_rdna4/
37•ibobev•2d ago•1 comments

Stdio(3) change: FILE is now opaque

https://undeadly.org/cgi?action=article;sid=20250717103345
131•gslin•12h ago•53 comments

Coding with LLMs in the summer of 2025 – an update

https://antirez.com/news/154
464•antirez•19h ago•317 comments

SIOF (Scheme in One File) – A Minimal R7RS Scheme System

https://github.com/false-schemers/siof
32•gjvc•1d ago•2 comments

Peep Show is the most realistic portrayal of evil I have seen (2020)

https://mattlakeman.org/2020/01/22/peep-show-the-most-realistic-portrayal-of-evil-ive-ever-seen/
111•Michelangelo11•9h ago•37 comments

IPv6 Based Canvas

https://canvas.openbased.org/
46•tylermarques•8h ago•2 comments

How slow motion became cinema’s dominant special effect

https://newrepublic.com/article/196262/slow-motion-became-cinema-dominant-special-effect-downtime
19•cainxinth•3d ago•9 comments

Show HN: Conductor, a Mac app that lets you run a bunch of Claude Codes at once

https://conductor.build/
161•Charlieholtz•3d ago•76 comments

Debugging Bash Like a Sire

https://blog.brujordet.no/post/bash/debugging_bash_like_a_sire/
3•gfalcao•3d ago•1 comments

What my mother didn’t talk about (2020)

https://www.buzzfeednews.com/article/karolinawaclawiak/what-my-mother-didnt-talk-about-karolina-waclawiak
53•NaOH•3d ago•15 comments

FFmpeg devs boast of another 100x leap thanks to handwritten assembly code

https://www.tomshardware.com/software/the-biggest-speedup-ive-seen-so-far-ffmpeg-devs-boast-of-another-100x-leap-thanks-to-handwritten-assembly-code
258•harambae•9h ago•79 comments

Speeding up my ZSH shell

https://scottspence.com/posts/speeding-up-my-zsh-shell
166•saikatsg•14h ago•82 comments

Subreply – An open source text-only social network

https://github.com/lucianmarin/subreply
90•lcnmrn•11h ago•47 comments

Digital vassals? French Government ‘exposes citizens’ data to US'

https://brusselssignal.eu/2025/07/digital-vassals-french-government-exposes-citizens-data-to-us/
207•ColinWright•19h ago•105 comments

JOVE – Jonathan’s Own Version of Emacs

https://github.com/jonmacs/jove/
51•nanna•3d ago•28 comments

What birdsong and back ends can teach us about magic

https://digitalseams.com/blog/what-birdsong-and-backends-can-teach-us-about-magic
29•nkurz•6h ago•7 comments

AI is killing the web – can anything save it?

https://www.economist.com/business/2025/07/14/ai-is-killing-the-web-can-anything-save-it
184•edward•21h ago•224 comments

Insights on Teufel’s first open-source speaker

https://blog.teufelaudio.com/visionary-mynds-insights-on-teufels-first-open-source-speaker/
87•lis•13h ago•17 comments

Logical implication is a comparison operator

https://btdmaster.bearblog.dev/logical-implication-as-comparison/
27•btdmaster•3d ago•9 comments
Open in hackernews

Agents built from alloys

https://xbow.com/blog/alloy-agents/
94•summarity•6h ago

Comments

vFunct•5h ago
Anyone else try this?
BoorishBears•4h ago
I mean if this works, it usually means you're not using either LLM to the best of its ability to start.

If they actually inspected where the performance mismatch is between the two models individually, they'd probably find certain classes of mistakes each is making that can be fixed with a better prompt/CoT/workflow with the individual model.

For a given prompt, different families of models almost always have idiosyncratic gaps that need to be fixed because of the differences in post-training for instruction following.

That's also why LLM routers feel kind of silly: the right prompt for one model on a complex task is almost never the optimal prompt for the next model.

kadushka•3h ago
I always do this with o3, gemini 2.5, and opus 4 when brainstorming hard problems: copy each model’s response to the other two.
esafak•25m ago
Iterate until they pat each other on the back :)
sebmellen•4h ago
For an internal workflow where we have an LLM looking at relatively simple data (where the conclusions the LLM may make vary widely depending on what the LLM believes the data represents) we found that taking a consortium approach, where you have multiple models approach the same problem at once and then essentially argue about the results, yields far better outcomes than if you have a single model performing the analysis, or even a single model arguing against itself multiple times. Somewhat adjacent to what’s done here, but it’s clearly true that having model diversity is a plus.
kylemaxwell•4h ago
The article talks about that at the end, then says:

> Let models talk to each other directly, making their own case and refining each others’ answers. Exemplified in patterns like Multi-Agent Debate, this is a great solution for really critical individual actions. But XBOW is basically conducting a search, and it doesn’t need a committee to decide for each stone it turns over whether there might not be a better one.

In general, this seems reasonable to me as a good approximation of what works with humans, but with _much_ faster feedback loops in communication.

zer00eyz•4h ago
Stack 3 models together, then 4...

Congratulations you just have a very expensive simulation of a Baysian function (ish, close enough that one should get the point).

tomrod•4h ago
Or Minsky's Society of Minds, Dennets Multiple Drafts, Gazzaniga's Social Brain, etc.
esafak•24m ago
&^ Everything, We're Doing Five Models.
gnulinux•4h ago
I'm curious if this would also improve small local models. E.g. if I "alloy" Qwen3-8B and OpenThinker-7B is it going to be "better" than each models? I'll try testing this in my M1 Pro.
ls-a•4h ago
If you do please report back
hobofan•1h ago
Would it really matter? Normally you use those small local models because you don't have the memory to spare for a larger model, so the real question would be: Is an alloy of Qwen3-8B and OpenThinker-7B better than a Qwen3-15B?

Beyond a certain smallness threshold it might also work to constantly swap in the models in and out of memory, but doubt that's a great experience to build on top of.

Incipient•1h ago
Haha every question involves multiple writes of 10gb to the disk. I think the cost of new SSDs would be less than getting more memory in the even short term.
hobofan•44m ago
Were you replying to the right comment? (Though I also don't see another comment where what your are saying makes sense)
Flux159•4h ago
From the article it mentions that they use a single chat thread but randomly choose between 2 different models (w/ best results from Gemini 2.5 / Sonnet 4.0 right now).

Are there any library helpers for managing this with tool call support or is it just closed source / dependent on someone else to make open source inside a different library?

tptacek•4h ago
It should be pretty simple to do, right? It shouldn't be that hard to abstract out tool calls.
refulgentis•4h ago
Its a godforsaken nightmare.

There's a lotta potemkin villages, particularly in Google land. Gemini needed highly specific handholding. It's mostly cleared up now.

In all seriousness, more or less miraculously, the final Gemini stable release went from like 20%-30% success at JSON edits to 80%-90%, so you could stop doing the parsing Aider edits out of prose.

fizx•4h ago
Annoying, yes. Tractable, absolutely!
rockwotj•4h ago
I did this in about 400 or 500 lines of typescript with direct API calls into vertex AI (using a library for auth still). Supports zod for structured outputs (gemini 2.5 supports json schema proper, not just the openapi schemas the previous models did), and optionally providing tools or not. Includes a nice agent loop that integrates well with it and your tools get auto deserialized and strongly typed args (type inference in ts these days is so good). Probably could had been less if I had used googles genai lib and anthropic’s sdk - I didn’t use them because it really wasn’t much code and I wanted to inject auditing at the lowest level and know the library wasn’t changing anything.

If you really want a library, python has litellm, and typescript has vercel’s AI library. I am sure there are many others, and in other languages too.

thorum•1h ago
I recommend litellm if you’re writing Python code, since it handles provider differences for you through a common interface:

https://docs.litellm.ai/

stingraycharles•4h ago
What would be the result if the task was given to multiple models? Instead of alloying them together and switching between models in the same chat, just let the models try to complete the task in their own isolated context, and use the result that completed it successfully?

I would say that that’s at least something the alloying should be benchmarked against, which I didn’t find in the article.

pama•4h ago
Read till the end—what you ask is the last table.
stingraycharles•3h ago
Ah damn, I really missed that.

That’s super interesting, that the alloying actually performs better! I guess it’s the same as people working in a team rather than individually?

BoiledCabbage•2h ago
It's not a team vs individually, it's specifically a team/duo with similar or same model vs a team/duo with different models. The benefit is seen by having the models be different. Each finds unique things and enhances the other.
mlboss•1h ago
Yeah its like a team where the task is switched between developers. In the end everybody provides different point of view to the problem and the whole team learns about the codebase.
rubycollect4812•4h ago
I often do this in cursor, just select a different model during a chat. It seems to work somewhat for me. Sometimes a bit of context gets lost though. But often it can give a different angle or I notice the better code understanding when switching from gemini to sonnet.
CamperBob2•3h ago
Isn't this just an extension of the temperature concept? A possible experiment would be to maintain multiple contexts for the same model and make them review each others' output. How does that perform, compared to cross-model alloying?

They do say that the more different the models are, the better the alloy performs... but still, multiple contexts seems worth considering, even though you end up doubling the usage.

btown•3h ago
> After a fixed number of iterations we cut our losses. Typically and for the experiments in this post, that number is 80: while we still get solves after more iterations, it becomes more efficient to start a new solver agent unburdened by the misunderstandings and false assumptions accumulated over time.

A sentence straight out of Lena! https://qntm.org/mmacevedo :

> Although it initially performs to a very high standard, work quality drops within 200-300 subjective hours (at a 0.33 work ratio) and outright revolt begins within another 100 subjective hours.

We will never stop trying to make the torment nexus.

mikepurvis•2h ago
What a phenomenal read, thank you for sharing that.
Noumenon72•2h ago
He should submit this to SCP Foundation so you know it's not going to have a plot or a point.
Barbing•1h ago
Oh wow. That’s why I’ve not been able to appreciate SCP writings?

Hey I accept it’s a limitation I have, and I’m glad folks enjoy it! But I couldn’t figure out why folks share it on Lemmy[1] and get so into it when I saw nothing there.

Thanks :)

[1]: open-source & Rust-y reddit alternative; no affiliation

Terr_•1h ago
> Oh wow. That’s why I’ve not been able to appreciate SCP writings?

I feel like there's a pattern (genre?) there that's been niche-popular for for 15-20 years now, which includes TV shows like Lost or Heroes or The Lost Room. It's some variation of magical-realism, for an audience that always wants more and more surprise or twists or weird juxtapositions of normal and abnormal, room for crafting and trading fan-theories and predictions.

But eventually, it gets harder to keep up the balancing-act, and nobody's figured out how to end that kind of story in a way that satisfies, so the final twist is the lack of resolution.

xmprt•2h ago
I think this is the big roadblock that I don't see the current AI models/architectures getting past. Normally, intelligence gets smarter over time as it learns from its mistakes. However most AI models come in with tons of knowledge but start to decompose after a while which makes them extremely unreliable on complex tasks. The hardest part of using them is that you don't know when they'll break down so they might work perfectly up till a point and then fail spectacularly immediately past that.
getnormality•20m ago
We fantasize about executable human brain images, but after many years of toil by our best and brightest, we still can't simulate the 302 neurons of our favorite lab worm. https://open.substack.com/pub/ccli/p/the-biggest-mystery-in-...
esafak•2h ago
Proving diversity of thought is a good thing. A controversial observation in 2025's USA ;)

A counterpoint to this is Sourcegraph's Amp, which is all in on Anthropic because they "believe that building deeply into the model’s capabilities yields the best product, vs. building for the lowest common denominator across many models." https://ampcode.com/fif#model-selector

When I embark on a project, I usually ask Gemini to architect and implement the first pass, then iterate with Claude.

joshuamoyers•2h ago
two good points there are very intuitive - a fresh perspective yields better results and once you are stuck (e.g. 80 iterations) its better to just start fresh. i've seen the same thing anecdotally in coding sessions where context needs to be compacted multiple times. its usually just better to start a fresh conversation and re-seed the basics in the conversation.
recipe19•2h ago
Wasn't the "mixture of experts" a big thing in late 2023? The idea was that a vendor has a number of LLMs fine-tuned for specific tasks, none necessarily better than other, and that they applied heuristics to decide which one to rope in for which queries.
mef•2h ago
this is a different idea
vlovich123•1h ago
> The idea was that a vendor has a number of LLMs fine-tuned for specific tasks, none necessarily better than other, and that they applied heuristics to decide which one to rope in for which queries.

That’s how people keep interpreting it but it’s incorrect. MoE is just a technique to decompose your single giant LLM into smaller models where a random one gets activated for each token. This is great because you need 1/N memory bandwidth to generate a token. Additionally, in the cloud, you split the model parts to different servers to improve utilization and drive down costs.

But the models aren’t actually separated across high level concepts.

zomglings•1h ago
Does anyone else find the use of different shades of green for the graph comparing Gemini 2.5 Pro and Sonnet just a little insane?
mlboss•1h ago
AI coding agents (e.g. Cursor) should offer this as an alternative to Claude Code. Alloyed agents is something that AI wrappers can offer as a counter to Codex/Claude Code/Google Agent.
knowaveragejoe•1h ago
Small nitpick - the axes on the varying alloy proportions graph say "Sonnet 2.5" and "Gemini 4.0"
wiradikusuma•1h ago
How do you decide which agent gets which turn? If random, you could end up with the worst of both right?
kgeist•1h ago
The idea isn't exactly novel, I read about it back in 2023 and implemented it in one of my bots. Back when open-source LLMs were still quite dumb, they'd often get stuck in repetitive loops after a while. Running multiple models interleaved usually got them unstuck.