frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Housing constraints: why Silicon Valley hasn't done more for most Americans

https://www.slowboring.com/p/why-silicon-valley-hasnt-done-more
1•firasd•46s ago•0 comments

CIAM in the Sky

https://ciamweekly.substack.com/p/ciam-in-the-sky
1•mooreds•1m ago•0 comments

Javier Milei Wants to Rewire the Argentine Mind

https://www.nytimes.com/2026/04/20/world/americas/argentina-president-milei-inflation-economic-re...
1•mitchbob•2m ago•0 comments

Show HN: Gyrus :Open-Source AI Agents for Snowflake, SQL and Postgres

https://github.com/orgs/Gyrus-Dev/repositories
1•MalviyaPriyank•3m ago•0 comments

App host Vercel says it was hacked and customer data stolen

https://techcrunch.com/2026/04/20/app-host-vercel-confirms-security-incident-says-customer-data-w...
1•speckx•5m ago•0 comments

Show HN: Tmux-bar – One-tap switching between windows in current tmux session

https://github.com/daxliar/tmux-bar
1•zonovar•6m ago•0 comments

Show HN: Themeable HN

https://github.com/insin/comments-owl-for-hacker-news/releases/tag/v3.6.1
1•insin•6m ago•0 comments

The Vibe Code 103,000 AI-generated repos, only 1% production ready

https://useastro.com/vibe-code-report/
1•nishikawa7863•7m ago•0 comments

Tell HN: Codex/Claude Code one-off credit purchases are a money sink

1•mavsman•8m ago•0 comments

What I Learned About Billionaires at Jeff Bezos's Private Retreat

https://www.theatlantic.com/magazine/2026/05/billionaire-consequence-free-reality/686588/
2•robtherobber•9m ago•0 comments

Germany's Merz says industrial AI needs less stringent EU regulation

https://www.reuters.com/business/germanys-merz-says-industrial-ai-needs-less-stringent-eu-regulat...
1•ulrischa•10m ago•0 comments

Are Strings Still Our Best Hope for a Theory of Everything?

https://www.quantamagazine.org/are-strings-still-our-best-hope-for-a-theory-of-everything-20260323/
1•digital55•10m ago•0 comments

Claude helped build a wetlab+sequence my DNA at home, with 0 lab experience

https://vibe-genomics.replit.app/
1•banana-bae•11m ago•1 comments

Effectful Recursion Schemes

https://effekt-lang.org/blog/recursion-schemes/
1•marvinborner•12m ago•0 comments

Did Artemis II mission do lunar science or go to the Moon for humanity?

https://jatan.space/moon-monday-issue-271/
1•JPLeRouzic•13m ago•0 comments

Physical Media Is Pretty Cool

https://michaelenger.com/blog/physical-media-cool/
1•speckx•13m ago•0 comments

Show HN: Mailto.Bot – Email API for AI agents with native MCP support

https://mailto.bot
1•jerryluk•13m ago•0 comments

Engineers Kick-Started the Scientific Method

https://spectrum.ieee.org/francis-bacon-scientific-method
1•Brajeshwar•16m ago•0 comments

Can LLMs Flip Coins in Their Heads?

https://pub.sakana.ai/ssot/
1•hardmaru•16m ago•0 comments

Itanium: Intel's Great Successor [video]

https://www.youtube.com/watch?v=-K-IfiDmp_w
1•vt240•18m ago•0 comments

The Abstraction Fallacy: Why AI Can Simulate but Not Instantiate Consciousness

https://deepmind.google/research/publications/231971/
2•LopRabbit•18m ago•0 comments

Students are speeding through their online degrees in weeks, alarming educators

https://www.washingtonpost.com/education/2026/04/19/accelerated-college-degree-hacking/
1•delichon•18m ago•0 comments

Deezer says 44% of songs uploaded to its platform daily are AI-generated

https://techcrunch.com/2026/04/20/deezer-says-44-of-songs-uploaded-to-its-platform-daily-are-ai-g...
1•FiddlerClamp•19m ago•0 comments

Known modeling errors keep the federal expansion machine running

https://www.strongtowns.org/journal/2026-4-20-the-inflated-numbers-that-unlock-billions
1•zino3000•22m ago•0 comments

So What If They Have My Data?

https://cardcatalogforlife.substack.com/p/so-what-if-they-have-my-data
2•speckx•23m ago•0 comments

Kimi K2.6: Advancing Open-Source Coding

https://twitter.com/Kimi_Moonshot/status/2046249571882500354
10•nekofneko•24m ago•1 comments

Licensing Best Practices for the Sharing of Scientific Data

https://creativecommons.org/2026/04/20/licensing-best-practices-for-the-sharing-of-scientific-data/
2•Tomte•25m ago•0 comments

The printing press for biological data (Sterling Hooten)

https://www.owlposting.com/p/the-printing-press-for-biological
2•crescit_eundo•25m ago•0 comments

MoA-X: Mixture of Agents Orchestration Framework

https://github.com/drivelineresearch/moa-x
2•icelancer•27m ago•0 comments

Top Gun 3 Is Happening: The Need for Speed Lives On

https://avgeekery.com/top-gun-3-is-happening/
1•freediver•27m ago•0 comments
Open in hackernews

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving

https://qwen.ai/blog?id=qwen3.6-max-preview
140•mfiguiere•1h ago

Comments

jjice•1h ago
With them comparing to Opus 4.5, I find it hard to take some of these in good faith. Opus 4.7 is new, so I don't expect that, but Opus 4.6 has been out for quite some time.
Someone1234•1h ago
If money is no object, then nothing else is worth considering if it isn't Codex 5.4/Opus 4.7/SOTA. But for many to most people, value Vs. relative quality are huge levers.

Even many people on a Claude subscription aren't choosing or able to choose Opus 4.7 because of those cost/usage pressures. Often using Sonnet or an older opus, because of the value Vs. quality curve.

wahnfrieden•1h ago
Codex subscription is very generous at pro tiers
dd8601fn•48m ago
Also us weirdos with local model uses. But your point stands.
seplite•42m ago
Unfortunately, like with the release of Qwen3.6-Plus, this model also isn’t released for local use. From the linked article: “Qwen3.6-Max-Preview is the hosted proprietary model available via Alibaba Cloud Model Studio”
zozbot234•41m ago
The Max series was never available for local use, though. So this is expected.
CamperBob2•47m ago
Cost may or may not be a factor in my choice of model, but knowing the capabilities and knowing they will remain consistent, reliable, and available over time is always a dominant consideration. Lately, Anthropic in particular has not been great at that.
hirako2000•1h ago
You compare with what's most comparable.

In any case a benchmark provided by the provider is always biased, they will pick the frameworks where their model fares well. Omit the others.

Independent benchmarks are the go to.

alex_young•1h ago
Quite some time is a little over 2 months. I understand this is actually true right now, but it’s still a bit hard to accept.
oidar•1h ago
Opus 4.6 performance has been so wildly inconsistent over the past couple of months, why waste the tokens?
bluegatty•39m ago
I think its only been like 10 weeks. I meant that's forever in AI time, but not a long time in normie people time.
vidarh•34m ago
When Sonnet 4.6 was released, I switchmed my default from Opus to Sonnet because it was about en par with Opus 4.5. While 4.6 and 4.7 are "better", the leap is too small for most tasks for me to need it, and so reducing cost is now a valid reason to stay at that level.

If even cheaper models start reaching that level (GLM 5.1 is also close enough that I'm using it at lot), that's a big deal, and a totally valid reason to compare against Opus 4.5

jasonjmcghee•13m ago
Wow I couldn't disagree more.

For me, Opus 4.5 and 4.6 feel so different compared to sonnet.

Maybe I'm lazy or something but sonnet is much worse in my experience at inferring intent correctly if I've left any ambiguity.

That effect is super compounding.

trvz•1h ago
The fun thing is, you can be aware of the entire range of Qwen models that are available for local running, but not at all about their cloud models.

I knew of all the 3.5’s and the one 3.6, but only now heard about the Plus.

Alifatisk•34m ago
Their Plus series have existed since Qwen chat was available , as far as I remember. I can at least remember trying out their Plus model early last year.
Oras•1h ago
I find it odd that none of OpenAI models was used in comparison, but used Z GLM 5.1. Is Z (GLM 5.1) really that good? It is crushing Opus 4.5 in these benchmarks, if that is true, I would have expected to read many articles on HN on how people flocked CC and Codex to use it.
esafak•1h ago
I use it and think its intelligence compares favorably with OpenAI and Anthropic workhorses. Its biggest weakness is its speed.
kardianos•1h ago
Yes. GLM 5.1 is that good. I don't think it is as good as Claude was in January or February of this year, but it is similar to how Claude runs now, perhaps better because I feel like it's performance is more consistent.
__blockcipher__•1h ago
Yeah GLM’s great for coding, code review, and tool use. Not amazing at other domains.
throwaw12•1h ago
maybe they decided OpenAI has different market, hence comparing only with companies who are focusing in dev tooling: Claude, GLM
edwinjm•57m ago
Haven’t you heard about Codex?
throwaw12•49m ago
its an SKU from OpenAI's perspective, broader goal and vision is (was) different. Look at the Claude and GLM, both were 95% committed to dev tooling: best coding models, coding harness, even their cowork is built on top of claude code
zozbot234•35m ago
I'm not sure how this makes sense when Claude models aren't even coding specific: Haiku, Sonnet, Opus are the exact same models you'd use for chat or (with the recent Mythos) bleeding edge research.
throwaw12•21m ago
Anthropic models and training data is optimized for coding use cases, this is the difference.

OpenAI on the other hand has different models optimized for coding, GPT-x-codex, Anthropic doesnt have this distinction

ac29•1h ago
GLM 5.1 is pretty good, probably the best non-US agentic coding model currently available. But both GLM 5.0 and 5.1 have had issues with availability and performance that makes them frustrating to use. Recently GLM 5.1 was also outputting garbage thinking traces for me, but that appears to be fixed now.
cmrdporcupine•55m ago
Use them via DeepInfra instead of z.ai. No reliability issues.

https://deepinfra.com/zai-org/GLM-5.1

Looks like fp4 quantization now though? Last week was showing fp8. Hm..

wolttam•46m ago
Deepinfra's implementation of it is not correct. Thinking is not preserved, and they're not responding to my submitted issue about it.

I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai.

I really liked Deepinfra but something doesn't seem right over there at the moment.

cmrdporcupine•5m ago
Damn. Yeah, that sucks. I did play with it earlier again and it did seem to slow down.

It's frankly a bummer that there's not seemingly a better serving option for GLM 5.1 than z.AI, who seems to have reliability and cost issues.

pros•1h ago
I'm using GLM 5.1 for the last two weeks as a cheaper alternative to Sonnet, and it's great - probably somewhere between Sonnet and Opus. It's pretty slow though.
c0n5pir4cy•1h ago
I've been using it through OpenCode Go and it does seem decent in my limited experience. I haven't done anything which I could directly compare to Opus yet though.

I did give it one task which was more complex and I was quite impressed by. I had a local setup with Tiltdev, K3S and a pnpm monorepo which was failing to run the web application dev server; GLM correctly figured out that it was a container image build cache issue after inspecting the containers etc and corrected the Tiltfile and build setup.

coder68•43m ago
In fact it is appreciated that Qwen is comparing to a peer. I myself and several eng I know are trying GLM. It's legit. Definitely not the same as Codex or Opus, but cheaper and "good enough". I basically ask GLM to solve a program, walk away 10-15 minutes, and the problem is solved.
Oras•36m ago
cheaper is quite subjective, I just went to their pricing page [0] and cost saving compared to performance does not sell it well (again, personal opinion).

CC has a limited capacity for Opus, but fairly good for Sonnet. For Codex, never had issues about hitting my limits and I'm only a pro user.

https://z.ai/subscribe

Alifatisk•41m ago
GLM-5 is good, like really good. Especially if you take pricing into consideration. I paid 7$ for 3 months. And I get more usage than CC.

They have difficulty supplying their users with capacity, but in an email they pointed out that they are aware of it. During peak hours, I experience degraded performance. But I am on their lowest tier subscription, so I understand if my demand is not prioritized during those hours.

ekuck•18m ago
Where are you getting 3 months for $7?
cleaning•40m ago
Most HN commenters seem to be a step behind the latest developments, and sometimes miss them entirely (Kimi K2.5 is one example). Not surprising as most people don't want to put in the effort to sift through the bullshit on Twitter to figure out the latest opinions. Many people here will still prefer the output of Opus 4.5/4.6/4.7, nowadays this mostly comes down to the aesthetic choices Anthropic has made.
Oras•32m ago
Not just aesthetics though, from time to time I implement the same feature with CC and Codex just to compare results, and I yet to find Codex making better decisions or even the completeness of the feature.

For more complicated stuff, like queries or data comparison, Codex seems always behind for me.

vidarh•25m ago
GLM 5.1 is the first model I've found good enough to spring for a subscription for other than Claude and Codex.

It's not crushing Opus 4.5 in real-life use for me, but it's close enough to be near interchangeable with Sonnet for me for a lot of tasks, though some of the "savings" are eaten up by seemingly using more tokens for similar complexity tasks (I don't have enough data yet, but I've pushed ~500m tokens through it so far.

0xbadcafebee•49m ago
Everybody's out here chasing SOTA, meanwhile I'm getting all my coding done with MiniMax M2.5 in multiple parallel sessions for $10/month and never running into limits.
Aurornis•31m ago
For serious work, the difference between spending $10/month and $100/month is not even worth considering for most professional developers. There are exceptions like students and people in very low income countries, but I’m always confused by developers with in careers where six figure salaries are normal who are going cheap on tools.

I find even the SOTA models to be far away from trustworthy for anything beyond throwaway tasks. Supervising a less-than-SOTA model to save $10 to $100 per month is not attractive to me in the least.

I have been experimenting with self hosted models for smaller throwaway tasks a lot. It’s fun, but I’m not going to waste my time with it for the real work.

zozbot234•24m ago
You need to supervise the model anyway, because you want that code to be long-term maintainable and defect free, and AI is nowhere near strong enough to guarantee that anytime soon. Using the latest Opus for literally everything is just a huge waste of effort.
dandaka•19m ago
Waste of effort... of Opus? If "Opus effort" is cheaper, than dev hours managing yourself more dumb/effective model, what is the point?
cyanydeez•5m ago
rich people dont concern themselves with the cost of tokens.
ninjahawk1•41m ago
The way to develop in this space seems to be to give away free stuff, get your name out there, then make everything proprietary. I hope they still continue releasing open weights. The day no one releases open weights is a sad day for humanity. Normal people won’t own their own compute if that ever happens.
visarga•31m ago
I think it is in the interest of chip makers to make sure we all get local models
zozbot234•29m ago
Definitely. Many big hardware firms are directly supporting HuggingFace for this very reason.
ninjahawk1•25m ago
True, chip companies have the opposite mindset, Nvidia is making their own open weights I believe
qalmakka•5m ago
I think they're in a win-win situation. Big AI companies would love to see local computing die in favour of the cloud because they are well aware the moment an open model that can run on non ludicrous consumer hardware appears, they're screwed. In this situation Nvidia, AMD and the like would be the only ones profiting from it - even though I'm not convinced they'd prefer going back to fighting for B2C while B2B Is so much simpler for them
testbjjl•26m ago
Any reason for them to do this other than altruism? I don’t think this can be regulated.
Rohansi•13m ago
Bake ads into them.
baq•23m ago
Always has been, it’s literally saas; the slight difference is that the lowest tier subscriptions at the frontier labs are basically free trials nowadays, too
CamperBob2•14m ago
I'm a little more optimistic than that. I suspect that the open-weight models we already have are going to be enough to support incremental development of new ones, using reasonably-accessible levels of compute.

The idea that every new foundation model needs to be pretrained from scratch, using warehouses of GPUs to crunch the same 50 terabytes of data from the same original dumps of Common Crawl and various Russian pirate sites, is hard to justify on an intuitive basis. I think the hard work has already been done. We just don't know how to leverage it properly yet.

WarmWash•1m ago
The Chinese state wants the world using their models.

People think that Chinese AI labs are just super cool bros that love sharing for free.

The don't understand it's just a state sponsored venture meant to further entrench China in global supply and logistics. China's VCs are Chinese banks and a sprinkle of "private" money. Private in quotes because technically it still belongs to the state anyway.

China doesn't have companies and government like the US. It just has government, and a thin veil of "company" that readily fool westerners.

atilimcetin•37m ago
Nowadays, I'm working on a realtime path tracer where you need proper understanding of microfacet reflection models, PDFs, (multiple) importance sampling, ReSTIR, etc.. Saying that mine is a pretty unique specific use case.

And I've using Claude, Gemini, GLM, Qwen to double check my math, my code and to get practical information to make my path tracer more efficient. Claude and Gemini failed me a couple of times with wrong, misleading and unnecessary information but on the other hand Qwen always gave me proper, practical and correct information. I almost stopped using Claude and Gemini to not to waste my time anymore.

Claude code may shine developing web applications, backends and simple games. But it's definitely not for me. And this is the story of my specific use case.

zozbot234•32m ago
What size of Qwen is that, though? The largest sizes are admittedly difficult to run locally (though this is an issue of current capability wrt. inference engines, not just raw hardware).
atilimcetin•31m ago
I'm directly using https://chat.qwen.ai (Qwen3.6-Plus) and planning to switch to Qwen Code with subscription.
jansan•19m ago
How "social" does Quen feel? The way I am using LLMs for coding makes this actually the most important aspect by now. Claude 4.6 felt like a nice knowledgeable coworker who shared his thinking while solving problems. Claude 4.7 is the difficult anti-social guy who jumps ahead instead of actually answering your questions and does not like to talk to people in general. How are Qwen's social skills?
zozbot234•16m ago
Qwen feels like wise Chinese philosopher. Talks in very short elegant sentences, but does very solid work.
jasonjmcghee•16m ago
You may be interested in "radiance cascades"
wg0•10m ago
I have said similar things about someone experiencing similar things while writing some OpenGL code (some raytracing etc) that these models have very little understanding and aren't good at anything beyond basic CRUD web apps.

In my own experience, even with web app of medium scale (think Odoo kind of ERP), they are next to useless in understanding and modling domain correctly with very detailed written specs fed in (whole directory with index.md and sub sections and more detailed sections/chapters in separate markdown files with pointers in index.md) and I am not talking open weight models here - I am talking SOTA Claude Opus 4.6 and Gemini 3.1 Pro etc.

But that narrative isn't popular. I see the parallels here with the Crypto and NFT era. That was surely the future and at least my firm pays me in cypto whereas NFTs are used for rewarding bonusess.

wg0•6m ago
Someone exactly said it better here[0] already.

[0]. https://news.ycombinator.com/item?id=47817982

DeathArrow•22m ago
I am trying since one week to subscribe Alibaba Coding Plan (to use Qwen 3.6 Plus) but it's always out of stock. They said they are refreshing at 00:00 each day but they don't allow new subscriptions, they just say they will restock the next day. :)