frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Running local models is good now

https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/
151•jfb•1h ago

Comments

_doctor_love•40m ago
"Just get a 64GB Mac with 1TB of storage!"

LOL - some of us have a budget

tjwebbnorfolk•34m ago
AI and budgets don't mix well at the moment
techscruggs•33m ago
He is using a 2022 M2, which you can get that for about $2k used. That is beyond reasonable.
psychoslave•20m ago
Global Affordability Estimate:

Top 10% of global earners (~800M people) can afford a $2,000 device without major financial strain.

Top 25% (~2B people) could afford it with some budget adjustments.

Bottom 50% (~4B people) would find it prohibitively expensive.

So for a SV top income, maybe that might look more like the weekly pet brushing budget, but for most people out there this is not that much of a no-brainer.

richwater•13m ago
Yes, because the bottom 50%, mostly impoverished or near impoverished folks were spending money on Claude Code subscriptions instead /s
disgruntledphd2•7m ago
The maths changes if you're working for yourself. Because I live in Europe, I've ended up working as a contractor due to the lack of a legal entity in my country. While that mostly sucked for a bunch of reasons, I was able to get a 64Gb Mac M2 a few years back with approximately a 52% discount, which was kinda nice.
Shekelphile•14m ago
She
themythfable•33m ago
Yeah, I never had a computer that cost north of $800 until recently. While that is far from the typical HN user's budget, my bet is that it is much closer to average.

Besides those with effectively unlimited budgets for their personal compute, local models are still a long ways off.

Though, that shouldn't be conflated with the value of open-source models, which can be used by cloud providers to significantly reduce cost of intelligence.

embedding-shape•30m ago
> Yeah, I never had a computer that cost north of $800 until recently. While that is far from the typical HN user's budget, my bet is that it is much closer to average.

There are segments, everything from "Average person in world" to "Average creative professional using computers for work", with a wide range of costs for the hardware. HN probably skews towards the latter rather than the former, probably sitting with enterprise hardware next to them basically for fun, hard to make wider conclusions from what people here have or not.

sublinear•10m ago
If we define "typical" as the median HN budget, it's probably about the same as yours. Maybe the answer would have been different 10 or 20 years ago, but the era of truly needing a big budget PC has been over for a while.

It's just for gaming and AI now. Maybe not even gaming as much anymore.

Consider the perspective of someone who has a practically unlimited budget for PCs, doesn't game much anymore, and doesn't need AI to do their job. It's just part of getting older, and there are plenty of people in their late 30s and older on here.

p-e-w•31m ago
No need. You can run the Gemma 4 and Qwen3.5 MoE models with as little as 12 GB of VRAM at 30-40 tps (Q4/Q5), and they both blow GPT-4o and DeepSeek R1 out of the water.
swatcoder•30m ago
Sure, but it's also not really out of scale with the cost of a shop tool in other trades.

If you're a professional that's confident in a positive return on the investment (optimal or not), or just a hobbyist with the luxury budget for a "shop" that cost is well within norms.

That's not everybody, of course, but it's not some inconceivable fantasy. A lot of people in the tech community here on HN, specifically, end up with pretty high discretionary budgets that they pour into stuff like this.

amalcon•28m ago
A Strix Halo with similar RAM is considerably cheaper. Still not cheap, mind, but performance is OK (not great) and it will run more or less the same models.
AbsurdCensor•17m ago
At least for me, it's been pretty great, but I bought my system when it was $1800, now looks like the same system is $2700 and out of stock. I still haven't quite been able to run 120B parameter models under Windows, but for Qwen Coder 30B, it works pretty darn well for my at home needs.
amalcon•13m ago
Yeah, they have gone up a lot since I bought mine too. I did get Qwen3.5-122b running on all-GPU (on a 128GB machine) under a minimal Arch Linux setup (I do my GUI work on a much cheaper box). It worked, but Qwen3.6-35b is performing almost as well and a lot faster.

Still cheaper than a new Mac. Maybe not cheaper than a used one.

anarticle•15m ago
Pros buy their own tools. This is why working for yourself is better than working for a corpo, you get to choose your weapon.
anax32•36m ago
I've just made a milestone on my project, moving away from AWS (budget) to self-hosted and the local models are so much faster than in the past. Beyond LLMs, having embeddings, image, video, audio gen available is crazy.

Running locally is the bar; it's hard to make these things a service which scales.

richbradshaw•36m ago
I’m keen to understand speed here etc etc. if I bought a Mac studio with 96GB - what can I realistically run, how’s it compare to fable/opus etc and how fast is it?

Currently maxing out two Claude code accounts every x hours when working on large code migrations or setting up new iOS apps etc - most of time it’s fine but occasionally it’s mega frustrating!

simonw•16m ago
I strongly recommend trying LM Studio - it's the lowest friction way to try out models, you can browse https://lmstudio.ai/models and click "Get" and then "Run in LM Studio" to download and run a model.

With 96GB I'd start with the Gemma 4 and Qwen 3.6 models. Any of those should work fine.

AbsurdCensor•8m ago
I think currently you can only get the M3 Ultra Studio with 96gb, and for coding tasks, say you rub Qwen Coder on it (which doesn't need that much ram), it's not the fastest, something like 30-40 tok/sec. Probably better with a MacBook Pro with the M5 chip. There is a website for comparing different configurations and models: https://llmcheck.net/benchmarks
rmunn•34m ago
This is the kind of thing that Anthropic et al should be worried about. As it becomes easier and easier to run local models, the ceiling of what they'll be able to charge will get lower and lower. Not that nobody will be willing to pay $$$$$ per month, but a lot of people are going to multiply the per-month charge by 12 or 24 and say "Could I set up a local model for less than that, and have it pay for itself within a year or two?" And if a significant portion of customers decide to buy instead of rent, the companies whose business model is entirely centered around renting will suddenly find themselves hurting for customers.
themaninthedark•32m ago
Maybe that is why they are buying up as much hardware as they can? If their service is the only game in town.
otterdude•30m ago
Data Center providers are buying hardware, not anthropic. Certainly related but alot of the hardware purchased is just sitting in a warehouse waiting for a data center to get built.
indoordin0saur•22m ago
I'm curious when coding-heavy companies will start running their own on-prem AI clusters. Has anyone had the idea to sell something like 4 GPU machine an engineering team could throw in a closet somewhere and run whatever they want on it? I imagine this won't appeal to everybody but with the trust issues the hyperscalers have developed hoovering up people's data and using it to train their models, I imagine some will find value in a machine and model they have transparent control over including the option to walk over and unplug the thing.
sathackr
embedding-shape•33m ago
Show us the resulting code of using them! :) I want to use local models, I have the hardware for it, but while trying them out as replacements for GPT 5.5 xhigh or Opus or other SOTA models, they aren't quite ready to be replaced yet, sadly. The quality and bumps they encounter just slows down the workflow so much, even screwing up tool call syntax sometimes.

But, for smaller more well-defined workflows, or as straight "edit this part to be like this exact" edits, they seem more than enough. Still waiting for them to become mature enough to be able to replace what we have as SOTA today, I'd say it's ready to be switched over then.

Speaking of local models, DiffusionGemma (and diffusion models in general) should not be slept on for local usage! Usually the problem locally is that the LLMs aren't efficiently making use of your hardware, unless you start batching requests and run many at the same time, but that require different approaches in general. Instead, diffusion models work much faster for individual prompts, and not by a small margin either.

Today I finally finished porting diffusiongemma-26B-A4B-it support from Transformers into Candle, and together with some optimizations I now have it basically flying with ~450 tok/s (~19 it/s) in Candle during inference, instead of ~180 tok/s (~11 it/s) from HF's Transformers library. Even using vLLM with similar sized LLMs, I don't think I've ever gotten past the ~250 tok/s threshold for single prompts, exciting stuff for local models :)

zozbot234•11m ago
> Instead, diffusion models work much faster for individual prompts, and not by a small margin either.

Diffusion models can't really be trained beyond low-to-mid size and have lower quality over an equally sized, plain one-token-at-a-time model.

cube00•32m ago
The challenge I have is getting a large enough context window so tool calls work reliably, the local models easily slip into hallucinated JSON tool responses and won't trigger the tools as a result.
hypfer•31m ago
After having been a happy user of Qwen3.6-27B for a few weeks, due to being away from the hardware, I'm currently forced to use Claude Sonnet 4.6

It is such a downgrade. I don't understand how that's even possible. The thing has so many strongly-held opinions I did not ever ask it for, talking just way too much and generally feeling somehow dumber.

Of course, being significantly larger, it will encode more knowledge, but that doesn't help me when I hate talking to it. And all that on top of the fact that talking with it costs real money.

I wonder what it might be that makes me hate it so much. Maybe because it doesn't see itself as a tool but almost an equal? As if its opinions would have weight.

Qwen too can act like an overeager intern, but if you tell it that it is an idiot, it will drop that ego. Not so much with Claude. In my experience, anyway.

Anyway, point is: full ack on that headline.

kitd•25m ago
Funny that coding agents have personalities, including "that colleague" you want to avoid even if you know they're probably quite good at what they do!
MostlyStable•24m ago
Curious if you have tried custom instructions. I was never quite as unhappy with Claude's voice as you appear to be, but there were several things I didn't like. A custom prompt fixed almost all of them.
clickety_clack•19m ago
I think it would be very hard to convince someone to pay $100/mo to go back to Claude if they have a local model up and running, particularly now that model improvement has basically been stalled for the last 6 months. It’s so easy to set it up for yourself now too with things like LM studio. That said, there will always be unsophisticated users who can’t figure it out, so there will always be someone there to pay.
wxw•29m ago
> “if we are constrained by performance and price, what architectural tradeoffs do we need to make?” a question that so far has not really been asked in the mad token gold rush.

To be fair, I think the labs are also interested in this (e.g OpenAI parameter golf). But the incentives are tricky. When the subsidies and tokenmaxxing era ends, local models will be essential.

cautiouscat•24m ago
> I have no concrete scientific evidence of this - my own personal vibe metric of “is a model good enough” is, “do I have to double-check it against an API model”, and GPT-OSS was the first one where I started doing that a lot less often.

The good old butt dyno!

I’ve been eyeing local models more and more with Anthropic squeezing more and more on the subscriptions. A few comments on HN had me waiting until they improved more but this article makes me wonder if I should reconsider that.

I’ve been doing some pretty niche development using a game and a script extender for said game. If these models can handle that, I’d feel good about switching.

xienze•21m ago
The big caveat here is that these local models require you to invest some time tweaking your harness, AGENTS.md, and skills in order to get things roughly to the level you'd expect. But something like Qwen3.6-27B with web search capabilities and a good set of skills really is impressive! Especially considering that you can go wild and not worry about token costs.

The other thing that people tend to gloss over is that you really do need to spend some $$$ on decent hardware. Yeah, you CAN run some 4-bit quant with heavily quantized cache on your 16GB card, but it's not going to be a great experience (I think this is where a lot of the "if you think it's gonna be any good, you're going to be disappointed" stuff comes from). Yes it's a lot of $$$ upfront but it's very much unknown when hardware prices are going to come back to reality. There's a lot of hopes and dreams that any minute now an H100 will be worth pennies because "that's how it's always been" w.r.t. computer hardware, but we are living in interesting times. So you can't just make the tired old assumptions that a Claude subscription over three years time will work out to be dramatically less than the value of some card three years from now. We STILL have basically anything with >=24GB VRAM appreciating in value, which is absolutely wild. What I'm saying is, the depreciation curve may very well be a lot less dramatic and fast than it used to be, going forward.

sosodev•19m ago
I think this is overselling their capabilities. I've used Gemma 4 and Qwen 3.6 quite a bit on my strix halo home server. They're great models and the dense variants are significantly better, but they're still very far behind the frontier. If you boot up Gemma 4 MoE and OpenCode/Pi and expect to perform anything like Claude Code or Codex you're going to be very disappointed.
chrismarlow9•18m ago
You can use a frontier model to create a plan that's specific enough for a local model of a very small size to execute on. The more specific you are and compartmentalize tasks the "dumber" the local model can be.

Edit: Obviously you'll be using more tokens but this is the trade off for running a smaller model and running locally. Similar to time memory trade off but in token economics. Sorry I need more coffee

simonw•18m ago
I think gemma-4-26b-a4b and Qwen3.6-35B-A3B show that there's something very interesting about a local model that does mixture-of-experts (which helps a lot with performance) and has in the order of 30 billion parameters.

These models are very capable, and use around 20-30GB of RAM while they are running.

Provided you have 64GB of RAM that leaves space for running other applications at the same time.

chrisweekly•6m ago
Obtaining that 64GB RAM is a meaningful obstacle for many.
stared•17m ago
I really recommend Qwen3.6 27B.

Make some tests, and its 8 bit version runs at 30tok/s when using llama.cpp with MTP and run on Macbook Max M5. I have 128 GB, but but 64 GB is well enough. https://github.com/stared/benching-local-llms-on-apple-silic...

When using benchmarks, it gives more-or-less the level of SotA mid-late 2026.

wizzledonker•8m ago
Did you mean 2025?
ibizaman•16m ago
Tangential but reading on mobile, the font size in the code snippets are all over the place. I actually have the same issue on my blog. Anyone knows why?
aliljet•12m ago
The problem here is always the cost-benefit. For $200/mo, you're receiving subsidized best of breed access. There's no model competing for that price anywhere. If a 27B param model is what you choose, show me your hardware! I would love to be wrong...
0xc0c0c0•11m ago
I have used local models (around 128 gb) and the big proprietary models, and while I do want local models to win, it's important we keep the expectations of local models realistic. There are many blog posts about how local models today can fully replace some of the proprietary models and in some cases its true for the much smaller proprietary models, its very clearly much more behind the larger models.

You can be far more ambiguous with your tasks with the larger proprietary models as opposed to the local models. You can achieve the similar results with local models but you need to be much more detailed in your prompt.

One of the biggest things about running these local models is that the harness matters almost just as much as the model too. Codex is optimized for GPT models, CC is optimized for Claude, Cursor has a great harness that works very well across these providers. It took me a couple of iterations of the different harnesses to find one that would work well with the smaller Qwen models to do local coding.

wasimxyz•9m ago
https://canirun.ai
anubhav200•9m ago
I have been using qwen and glm based models from last 2 years, ended up buying mutiple machines for the same. Overall i feel 24vram is a must have to get get performance (speed wise) to match hosted soln. I have 2 machines a 12gb vram one and a 24gb one. On 12gb vram i get around 50tps generation and 500tps prompt processing and on 24gb one i get 180tps generation and 3500tps prompt processing. I have different configs for different scenarios and I also use llama cpp manager manage all my configs (https://github.com/anubhavgupta/llama-cpp-manager)
•
20m ago
The opposite of that has been happening for 20 years now with cloud compute.

It won't happen with AI models either.

It's almost ingrained in the American business model now. Outsource everything. Nobody wants to manage a room full of servers when they can spend 2-3x as much and outsource that headache along with the responsibility for it.

Same will happen with AI. Whether that means paying Anthropic that premium or paying AWS.

I'm in a relatively small business, we recently had an outage related to our local infrastructure.

I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single lf the larger recent AWS outages.

Everyone wants to shuck the chore and the responsibility.

derfurth•8m ago
That's an interesting take, however there is no ongoing maintenance related to local models, maybe the only effort is giving more capable machines to the workforce; but yeah I can see how it might feel like a barrier.
cheema33•5m ago
> I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.

Same here. My job as a software dev does not require me to self-host services we need and use. Quite the opposite. But, I am reluctant to hand over all control to AWS or equivalent for several reasons that I will get into here.

I have found that Infrastructure as Code (IaC) and modern tools like opentofu, ansible, combined with frontier AI models and harnesses gives you superpowers in this space. Almost all of our self-hosted services are fully managed by these tools. e.g. We perform backups and test them more often now than we ever did before. Entirely because it is so much easier to do all of that now.

icoder•19m ago
What I don't understand is that on one hand we read 'what they charge is much less than it costs them' and on the other hand this thread seems to suggest that 'what they charge is more than it would cost me'.
esailija•13m ago
Bigger models that Antrophic want to sell cost disproportionately more (e.g. 100% more cost for 5% performance improvement) than small models you would use locally
Retric•6m ago
One difference is you can pick a local model well suited for what you want which can be dramatically cheaper to run. Big AI companies aren’t simply giving you a dropdown of open models to choose from, they are building giant monoliths to do everything.

Also staff, buildings, training, etc cost money. People tend to ignore some of those costs as they’re paying for the same living space either way etc.

wuliwong•16m ago
These local models can do some of the work the non-frontier models can do but for me, that's not worth much. If I am just using Sonnet 4.6, I can pretty much work all day on the $20/month plan. And Sonnet is still a way more powerful model than a one you could self host on an M2 mac.

If things change to token usage billing for everyone, maybe I'll be singing a different tune but on a subscription, I don't think it makes sense financially.

Fun? Yes. Financially sound? No.

chrisweekly•10m ago
Not everyone has the right hardware.
Scoundreller•5m ago
The third category are the occasional users that won’t have the hardware and won’t stomach a monthly fee for “unlimited” but are happy to pay-per-use.

I’d think the volume for that category would be low but LLMs aren’t just for coding.

StevenWaterman•23m ago
Yep, I daily drive Qwen3.6-27B (including for work), have done pretty much since it came out. IMO it's the only (small-ish, local) model worth using, if you can run it. It might not be as good as Opus at "add X large feature" but I don't want that in a model. I want to do the thinking while it does the typing. And Qwen 3.6 27B is perfectly good at that (while in my experience models like the 35A3B and gemma are significant downgrades)

Plus, I never have to worry about rate limits, quotas, or sitting in a queue during peak time. And I can always see its full thoughts, don't have to worry about where my data is getting sent, and know it can't get secretly nerfed.

Running on 2x 3090, 500-1000tok/s prefill and 60tok/s output at Q6_K_XL with MTP on llama.cpp, 220k tokens context window (starts to get a bit dumb above 160k ish), no KV quantization

giancarlostoro•16m ago
> (starts to get a bit dumb above 160k ish)

If open models can ever hold roughly 600k token windows, I'll be really excited, I found that around 300 ~ 400k of Claude reading through your codebase results in better outputs. I also have Claude read official docs instead of "guessing" as to how to do something.

StevenWaterman•13m ago
I think we'll get there. Right now it works for me, because I'm naturally pretty verbose in my prompts, and know the codebase well, so I know what it needs to look at. Plus subagents for anything exploratory.

I think deepseek v4 pro has 1m context and does pretty well up to around 600k. But if you have the hardware to run that locally, you already know

Even then if there's a smaller model with 1M context, you'll need a ton of RAM to actually run it at full 1M. I guess that's why you don't see it too much. Anyone that could run Qwen 3.6 27B with 1m context would be better off running a much bigger model with smaller context instead, in the same amount of VRAM.

In terms of optimizing further, huge context + KV quantization sounds like a terrible idea, but there's some decent innovation in sparse attention, KV cache rotation allowing Q8 to perform nearly as well as full 16-bit precision, plus some ideas around offloading KV cache to system RAM (but I'm skeptical)

indoordin0saur•8m ago
> And I can always see its full thoughts, don't have to worry about where my data is getting sent, and know it can't get secretly nerfed.

For this reason I wonder if local models are a potential business opportunity. Provide the service to engineering teams to give them a pre-built and setup GPU rig they can run in a closet. No need to worry about all the things you mentioned and clients can rest-assured their data isn't disappearing into a sketchy data center. There might be regulatory reasons that make on-prem setups appealing as well.

amoshebb•6m ago
This is, as far as I know, the business model of coys like mistral and cohere
derethanhausen•23m ago
I would not generalize based on experiences with Sonnet. The flagship models (Opus being the claude equivalent) are dramatically better.
hypfer•20m ago
Opus in my experience is equally unpleasant "character"-wise, but at least it actually gets stuff done more often, so it's at least slightly more earned at that. It's still a neurotic cargo-culting dogmatic idiot, but one that at least sometimes does produce deliverables instead of only bottom-tier HN-esque opinions.

Hmm. I think I might just fundamentally disagree with Anthropic about the idea of what a "tool" should be.

giancarlostoro•18m ago
There's a model on Huggingface where someone takes Qwen and makes it think Opus style, and that one seems to be decent, not sure if they have the 27B variant in that style. I do wonder if you can tweak your system prompt to force Qwen to behave better?
StevenWaterman•10m ago
You read the OP backwards, they said Sonnet is a downgrade from Qwen, and prefer Qwen's tone
whythismatters•9m ago
Yes, Qwopus :) I've been pleasantly surprised by its quality
indoordin0saur•14m ago
Very curious what hardware you're running this on!
hypfer•10m ago
The same 24GB VRAM RTX 4090 I bought to play Cyberpunk 2077 with.

Works perfectly fine in llama.cpp throwing 70+t/s at me with 128k q8 K/V context when using the IQ4_NL quant + MTP at q4 MTP K/V.

Also leaving this here because you might find it useful: https://hypfer.github.io/will-it-fit-llama-cpp/

chrisweekly•11m ago
Why Sonnet 4.6 not Opus?
swatcoder•11m ago
Using the first-party Claude Code SKILLS as a signal for what Anthropic is training Claude itself to be good at generally, my sense is that they're crafting a tool that's very good at reproducing cargo cult practices that are only suited for certain use cases but that then become the default which many naive users will misapply and many savvy users have to swim upstream against. Yet those practices are suitable for some projects and so some can end up with genuinely impressive examples to showcase.

Ultimately, the whole concept of a singular right way to do things in our craft is absurd, and the popular/cargo-cult "best practices" of the 2020's era were already pretty questionable before they started getting burnt into everybody's favorite $1T helper as its default preference.

Other models may struggle to acheive parity on some of Claude's best fit showcase examples and some benchmarks, but their presumably weaker training may prove advantangeous as a more agile starting point provided their general capabilities prove strong enough to write sound code.

radium3d•10m ago
If you think about it, they're splitting the power across millions of users. Essentially, these AI companies have YOUR hardware that YOU are paying (them) for in a cabinet at some data center.

That said, it does make it possible to train the models having them in the same data center. Having them distributed to everyone would slow down training considerably.

Why AI Will Accelerate Health Care Inflation

https://www.healthaffairs.org/content/forefront/why-ai-accelerate-health-care-inflation
1•brandonb•44s ago•0 comments

Connecting to a Lot of People on LinkedIn via Browser DevTools

https://justinribeiro.com/chronicle/2026/06/11/connecting-to-a-lot-of-people-on-linkedin-via-brow...
1•speckx•1m ago•0 comments

'David Bowie was a crazy workaholic': Labyrinth at 40 – an oral history

https://www.theguardian.com/film/2026/jun/16/david-bowie-workaholic-labyrinth-at-40-oral-history
1•tosh•1m ago•0 comments

Anthropogenic Geomaterials – What Is This Rock?

https://aeon.co/essays/the-strange-rocks-that-wouldnt-exist-without-us
1•karakoram•1m ago•0 comments

Show HN: INT21 – Self-Improving PTX Kernel Factory

https://int21.ai/insights/introducing-int21-and-ptx-kernel-factory/
1•antinucleon•2m ago•0 comments

Copper drug restores memory and clears toxic Alzheimer's proteins

https://pubs.acs.org/doi/10.1021/acschemneuro.6c00252
1•Noaidi•3m ago•1 comments

Chainguard's new Athena coalition uses AI to fix open-source flaws

https://www.zdnet.com/article/chainguard-athena-coalition-fixes-open-source-flaws-before-ai-attac...
1•amouat•3m ago•0 comments

Publishers to bill AI firms for unwanted scraping or take them to court

https://pressgazette.co.uk/news/publishers-to-bill-ai-firms-for-unwanted-scraping-and-take-them-t...
1•thm•3m ago•0 comments

Google Earth Flight Simulator Is Now Available

https://twitter.com/googleearth/status/2065449043925381293
1•kordlessagain•4m ago•0 comments

Show HN: NIS2 Readiness Check

https://dmarcguard.io/tools/nis2-readiness/
1•meysamazad•5m ago•0 comments

Trump officials won't allow G7 countries to access Anthropic's advanced models

https://nypost.com/2026/06/16/business/trump-admin-open-to-talks-with-anthropic-over-foreigner-ban/
2•thm•5m ago•0 comments

Building a Soviet Nail Factory: how KPIs killed efficiency

https://vincent.bernat.ch/en/blog/2026-kpi-goodhart
1•vbernat•6m ago•0 comments

Cell-Based Architecture for Resilient Payment Systems

https://americanexpress.io/cell-based-architecture-for-resilient-payment-systems/
1•steveklabnik•8m ago•0 comments

The agent is a file. Define it once, call it from anywhere

https://coresource.ai/blog/2026-06-15-the-agent-is-a-file
1•lucadini•8m ago•0 comments

Dear Researchers: The Invisible Work

https://authors.elsevier.com/a/1n824_3kq%7ElDBR
1•azhenley•8m ago•0 comments

SubQ 1.1 Card: Linear-scaling sparse attention with 98% retrieval at 12M tokens [pdf]

https://subq.ai/docs/subq-1-1-small-model-card.pdf
1•mitchwainer•8m ago•0 comments

PointlessQuest puts a full MMO on Playdate's 2.7-inch screen

https://boingboing.net/2026/06/15/pointlessquest-puts-a-full-mmo-on-playdates-2-7-inch-screen.html
1•oidar•8m ago•0 comments

Pipkin's Light Bulb Moment

https://spark.iop.org/pipkins-light-bulb-moment
1•thunderbong•9m ago•0 comments

SpaceX Purchases Cursor, a Claude Code and OpenAI Codex Competitor

https://9to5mac.com/2026/06/16/spacex-lands-deal-to-likely-purchase-claude-code-and-openai-codex-...
1•mistersquid•11m ago•0 comments

The cost of saying no, then doing it anyway

https://quantumgardener.info/notes/the-cost-of-saying-no,-then-doing-it-anyway
2•speckx•11m ago•0 comments

Mental causation is not load-bearing

https://unstableontology.com/2026/06/07/mental-causation-is-not-load-bearing/
1•surprisetalk•11m ago•0 comments

Show HN: Agent Harness Lab – compare agent frameworks with swappable tools

https://github.com/graphlit/agent-harness-lab
1•kirkmarple•11m ago•0 comments

The price of liberty is eternal vigilance

1•KynaraAI•11m ago•0 comments

Calvin and Hobbes and the Price of Integrity

https://therepublicofletters.substack.com/p/calvin-and-hobbes-and-the-price-of
1•pseudolus•11m ago•0 comments

A Quick Intro to Nial

https://tangentstorm.github.io/nial/intro.ndf.html
1•tosh•12m ago•0 comments

MIT Grad Solved 2k Coding Problems was Rejected—Is the Interview System Broken?

https://www.quora.com/An-MIT-graduate-told-me-he-solved-over-2-000-coding-problems-and-memorized-...
1•USTECH_WORKER•12m ago•0 comments

Therapy for Billionaires

https://aeon.co/essays/what-would-a-therapy-camp-for-billionaires-look-like
1•karakoram•13m ago•0 comments

Hyperglycosylation is a metabolic driver of Alzheimer's disease

https://www.nature.com/articles/s42255-026-01538-4
1•AndrewDucker•13m ago•1 comments

Why Can't They Just

https://mailchi.mp/wherewithall/why-cant-they-just?e=10c4bf886a
1•RyeCombinator•14m ago•0 comments

A look into Ubuntu Core 26: Building a local AI inference appliance

https://ubuntu.com/blog/ubuntu-core-26-ai-box
2•jruohonen•14m ago•0 comments