frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

https://developer.apple.com/documentation/macos-release-notes/macos-26_2-release-notes#RDMA-over-...
166•guiand•2h ago•74 comments

GNU Unifont

https://unifoundry.com/unifont/index.html
117•remywang•2h ago•40 comments

Show HN: Tiny VM sandbox in C with apps in Rust, C and Zig

https://github.com/ringtailsoftware/uvm32
33•trj•1h ago•2 comments

Rats Play DOOM

https://ratsplaydoom.com/
117•ano-ther•3h ago•41 comments

Security issues with electronic invoices

https://invoice.secvuln.info/
63•todsacerdoti•3h ago•37 comments

Ensuring a National Policy Framework for Artificial Intelligence

https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-nati...
26•andsoitis•23h ago•47 comments

SQLite JSON at full index speed using generated columns

https://www.dbpro.app/blog/sqlite-json-virtual-columns-indexing
292•upmostly•10h ago•92 comments

Pg_ClickHouse: A Postgres extension for querying ClickHouse

https://clickhouse.com/blog/introducing-pg_clickhouse
57•spathak•2d ago•17 comments

4 billion if statements (2023)

https://andreasjhkarlsson.github.io//jekyll/update/2023/12/27/4-billion-if-statements.html
560•damethos•6d ago•158 comments

Fast Median Filter over arbitrary datatypes

https://martianlantern.github.io/2025/09/median-filter-over-arbitrary-datatypes/
10•martianlantern•6d ago•0 comments

Motion (YC W20) Is Hiring Senior Staff Front End Engineers

https://jobs.ashbyhq.com/motion/715d9646-27d4-44f6-9229-61eb0380ae39
1•ethanyu94•2h ago

String theory inspires a brilliant, baffling new math proof

https://www.quantamagazine.org/string-theory-inspires-a-brilliant-baffling-new-math-proof-20251212/
96•ArmageddonIt•7h ago•79 comments

Home Depot GitHub token exposed for a year, granted access to internal systems

https://techcrunch.com/2025/12/12/home-depot-exposed-access-to-internal-systems-for-a-year-says-r...
156•kernelrocks•5h ago•90 comments

Can I use HTTPS RRs?

https://www.netmeister.org/blog/https-caniuse.html
5•zdw•58m ago•1 comments

Async DNS

https://flak.tedunangst.com/post/async-dns
90•todsacerdoti•6h ago•28 comments

Capsudo: Rethinking Sudo with Object Capabilities

https://ariadne.space/2025/12/12/rethinking-sudo-with-object-capabilities.html
4•fanf2•1h ago•0 comments

Bit flips: How cosmic rays grounded a fleet of aircraft

https://www.bbc.com/future/article/20251201-how-cosmic-rays-grounded-thousands-of-aircraft
43•signa11•4d ago•42 comments

CM0 – A new Raspberry Pi you can't buy

https://www.jeffgeerling.com/blog/2025/cm0-new-raspberry-pi-you-cant-buy
151•speckx•8h ago•37 comments

Show HN: I made a spreadsheet where formulas also update backwards

https://victorpoughon.github.io/bidicalc/
6•fouronnes3•1d ago•0 comments

Microservices should form a polytree

https://bytesauna.com/post/microservices
94•mapehe•4d ago•90 comments

Good conversations have lots of doorknobs (2022)

https://www.experimental-history.com/p/good-conversations-have-lots-of-doorknobs
39•bertwagner•4d ago•7 comments

Epic celebrates "the end of the Apple Tax" after court win in iOS payments case

https://arstechnica.com/tech-policy/2025/12/epic-celebrates-the-end-of-the-apple-tax-after-appeal...
345•nobody9999•7h ago•223 comments

Google releases its new Google Sans Flex font as open source

https://www.omgubuntu.co.uk/2025/11/google-sans-flex-font-ubuntu
163•CharlesW•5h ago•77 comments

Using secondary school maths to demystify AI

https://www.raspberrypi.org/blog/secondary-school-maths-showing-that-ai-systems-dont-think/
87•zdw•6h ago•192 comments

Freeing a Xiaomi humidifier from the cloud

https://0l.de/blog/2025/11/xiaomi-humidifier/
3•stv0g•17h ago•0 comments

I couldn't find a logging library that worked for my library, so I made one

https://hackers.pub/@hongminhee/2025/logtape-fedify-case-study
3•todsacerdoti•7h ago•0 comments

Fedora: Open-source repository for long-term digital preservation

https://fedorarepository.org/
92•cernocky•10h ago•44 comments

Building small Docker images faster

https://sgt.hootr.club/blog/docker-protips/
3•steinuil•13h ago•0 comments

From text to token: How tokenization pipelines work

https://www.paradedb.com/blog/when-tokenization-becomes-token
103•philippemnoel•1d ago•19 comments

The true story of the Windows 3.1 'Hot Dog Stand' color scheme

https://www.pcgamer.com/software/windows/windows-3-1-included-a-red-and-yellow-hot-dog-stand-colo...
102•naves•4h ago•37 comments
Open in hackernews

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

https://developer.apple.com/documentation/macos-release-notes/macos-26_2-release-notes#RDMA-over-Thunderbolt
163•guiand•2h ago

Comments

nodesocket•1h ago
Can we get proper HDR support first in macOS? If I enable HDR on my LG OLED monitor it looks completely washed out and blacks are grey. Windows 11 HDR works fine.
Razengan•1h ago
Really? I thought it's always been that HDR was notorious on Windows, hopeless on Linux, and only really worked in a plug-and-play manner on Mac, unless your display has an incorrect profile or something/

https://www.youtube.com/shorts/sx9TUNv80RE

heavyset_go•1h ago
Works well on Linux, just toggle a checkmark in the settings.
masspro•1h ago
MacOS does wash out SDR content in HDR mode specifically on non-Apple monitors. An HDR video playing in windowed mode will look fine but all the UI around it has black and white levels very close to grey.

Edit: to be clear, macOS itself (Cocoa elements) is all SDR content and thus washed out.

Starmina•1h ago
That's intended behavior for monitor limited in peak brightness
nodesocket•1h ago
I don't think so. Windows 11 has a HDR calibration utility that allows you to adjust brightness and HDR and it maintains blacks being perfectly black (especially with my OLED). When I enable HDR on macOS whatever settings I try, including adjusting brightness and contrast on the monitor the blacks look completely washed out and grey. HDR DOES seem to work correctly on macOS but only if you use Mac displays.
masspro•1h ago
That’s the statement I found last time I went down this rabbit hole, that they don’t have physical brightness info for third-party displays so it just can’t be done any better. But I don’t understand how this can lead to making the black point terrible. Black should be the one color every emissive colorspace agrees on.
kmeisthax•5m ago
Actually, intended behavior in general. Even on their own displays the UI looks grey when HDR is playing.

Which, personally, I find to be extremely ugly and gross and I do not understand why they thought this was a good idea.

adastra22•1h ago
Huh, so that’s why HDR looks like shit on my Mac Studio.
m-ack-toddler•1h ago
AI is arguably more important than whatever gaming gimmick you're talking about.
simonw•1h ago
I follow the MLX team on Twitter and they sometimes post about using MLX on two or more joined together Macs to run models that need more than 512GB of RAM.

A couple of examples:

Kimi K2 Thinking (1 trillion parameters): https://x.com/awnihannun/status/1986601104130646266

DeepSeek R1 (671B): https://x.com/awnihannun/status/1881915166922863045 - that one came with setup instructions in a Gist: https://gist.github.com/awni/ec071fd27940698edd14a4191855bba...

awnihannun•1h ago
For a bit more context, those posts are using pipeline parallelism. For N machines put the first L/N layers on machine 1, next L/N layers on machine 2, etc. With pipeline parallelism you don't get a speedup over one machine - it just buys you the ability to use larger models than you can fit on a single machine.

The release in Tahoe 26.2 will enable us to do fast tensor parallelism in MLX. Each layer of the model is sharded across all machines. With this type of parallelism you can get close to N-times faster for N machines. The main challenge is latency since you have to do much more frequent communication.

liuliu•40m ago
But that's only for prefilling right? Or is it beneficial for decoding too (I guess you can do KV lookup on shards, not sure how much speed-up that will be though).
zackangelo•32m ago
No you use tensor parallelism in both cases.

The way it typically works in an attention block is: smaller portions of the Q, K and V linear layers are assigned to each node and are processed independently. Attention, rope norm etc is run on the node-specific output of that. Then, when the output linear layer is applied an "all reduce" is computed which combines the output of all the nodes.

EDIT: just realized it wasn't clear -- this means that each node ends up holding a portion of the KV cache specific to its KV tensor shards. This can change based on the specific style of attention (e.g., in GQA where there are fewer KV heads than ranks you end up having to do some replication etc)

liuliu•6m ago
I usually call it "head parallelism" (which is a type of tensor parallelism, but paralllelize for small clusters, and specific to attention). That is what you described: sharding input tensor by number of heads and send to respective Q, K, V shard. They can do Q / K / V projection, rope, qk norm whatever and attention all inside that particular shard. The out projection will be done in that shard too but then need to all reduce sum amongst shard to get the final out projection broadcasted to every participating shard, then carry on to do whatever else themselves.

I am asking, however, is whether that will speed up decoding as linearly as it would for prefilling.

monster_truck•7m ago
Even if it wasn't outright beneficial for decoding by itself, it would still allow you to connect a second machine running a smaller, more heavily quantized version of the model for speculative decoding which can net you >4x without quality loss
andy99•1h ago
I’m hoping this isn’t as attractive as it sounds for non-hobbyists because the performance won’t scale well to parallel workloads or even context processing, where parallelism can be better used.

Hopefully this makes it really nice for people that want the experiment with LLMs and have a local model but means well funded companies won’t have any reason to grab them all vs GPUs.

codazoda•51m ago
I haven’t looked yet but I might be a candidate for something like this, maybe. I’m RAM constrained and, to a lesser extent, CPU constrained. It would be nice to offload some of that. That said, I don’t think I would buy a cluster of Macs for that. I’d probably buy a machine that can take a GPU.
bigyabai•11m ago
The lack of official Linux/BSD support is enough to make it DOA for any serious large-scale deployment. Until Apple figures out what they're doing on that front, you've got nothing to worry about.
pstuart•1h ago
I imagine that M5 Ultra with Thunderbolt 5 could be a decent contender for building plug and play AI clusters. Not cheap, but neither is Nvidia.
whimsicalism•1h ago
nvidia is absolutely cheaper per flop
FlacksonFive•1h ago
To acquire, maybe, but to power?
whimsicalism•1h ago
machine capex currently dominates power
amazingman•47m ago
Sounds like an ecosystem ripe for horizontally scaling cheaper hardware.
crote•31m ago
If I understand correctly, a big problem is that the calculation isn't embarrasingly parallel: the various chunks are not independent, so you need to do a lot of IO to get the results from step N from your neighbours to calculate step N+1.

Using more smaller nodes means your cross-node IO is going to explode. You might save money on your compute hardware, but I wouldn't be surprised if you'd end up with an even greater cost increase on the network hardware side.

adastra22•1h ago
FLOPS are not what matters here.
whimsicalism•1h ago
also cheaper memory bandwidth. where are you claiming that M5 wins?
Infernal•1h ago
I'm not sure where else you can get a half TB of 800GB/s memory for < $10k. (Though that's the M3 Ultra, don't know about the M5). Is there something competitive in the nvidia ecosystem?
whimsicalism•56m ago
I wasn't aware that M3 Ultra offered a half terabyte of unified memory, but an RTX5090 has double that bandwidth and that's before we even get into B200 (~8TB/s).
650REDHAIR•35m ago
You could get x1 M3 Ultra w/ 512gb of unified ram for the price of x2 RTX 5090 totaling 64gb of vram not including the cost of a rig capable of utilizing x2 RTX 5090.
baq•1h ago
at current memory prices today's cheap is yesterday's obscenely expensive - Apple's current RAM upgrade prices are cheap
jeffbee•1h ago
Very cool. It requires a fully-connected mesh so the scaling limit here would seem to be 6 Mac Studio M3 Ultra, up to 3TB of unified memory to work with.
PunchyHamster•1h ago
I'm sure someone will figure out how to make thunderbolt switch/router
huslage•1h ago
I don't believe the standard supports such a thing. But I wonder if TB6 will.
novok•1h ago
Now we need some hardware that is rackmount friendly, an OS that is not fidly as hell to manage in a data center or headless server and we are off to the races! And no, custom racks are not 'rackmount friendly'.
joeframbach•1h ago
So, the Powerbook Duo Dock?
btown•1h ago
It would be incredibly ironic if, with Apple's relatively stable supply chain relative to the chaos of the RAM market these days (projected to last for years), Apple compute became known as a cost-effective way to build medium-sized clusters for inference.
andy99•1h ago
It’s gonna suck if all the good Macs get gobbled up by commercial users.
mschuster91•1h ago
it's not like regular people can afford this kind of Apple machine anyway.
teeray•12m ago
It’s just depressing that the “PC in every home” era is being rapidly pulled out from under our feet by all these supply shocks.
icedchai•6m ago
Outside of YouTube influencers, I doubt many home users are buying a 512G RAM Mac Studio.
teaearlgraycold•54m ago
It already is depending on your needs.
timsneath•1h ago
Also see https://www.engadget.com/ai/you-can-turn-a-cluster-of-macs-i...
geerlingguy•1h ago
This implies you'd run more than one Mac Studio in a cluster, and I have a few concerns regarding Mac clustering (as someone who's managed a number of tiny clusters, with various hardware):

1. The power button is in an awkward location, meaning rackmounting them (either 10" or 19" rack) is a bit cumbersome (at best)

2. Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability... wish they made a Mac with QSFP :)

3. Cabling will be important, as I've had tons of issues with TB4 and TB5 devices with anything but the most expensive Cable Matters and Apple cables I've tested (and even then...)

4. macOS remote management is not nearly as efficient as Linux, at least if you're using open source / built-in tooling

To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely, without a GUI, but it looks like you _have_ to use something like Screen Sharing or an IP KVM to log into the UI, to click the right buttons to initiate the upgrade.

Trying "sudo softwareupdate -i -a" will install minor updates, but not full OS upgrades, at least AFAICT.

eurleif•1h ago
I have no experience with this, but for what it's worth, looks like there's a rack mounting enclosure available which mechanically extends the power switch: https://www.sonnetstore.com/products/rackmac-studio
wlesieutre•1h ago
For #2, OWC puts a screw hole above their dock's thunderbolt ports so that you can attach a stabilizer around the cord

https://www.owc.com/solutions/thunderbolt-dock

It's a poor imitation of old ports that had screws on the cables, but should help reduce inadvertent port stress.

The screw only works with limited devices (ie not the Mac Studio end of the cord) but it can also be adhesive mounted.

https://eshop.macsales.com/item/OWC/CLINGON1PK/

crote•41m ago
That screw hole is just the regular locking USB-C variant, is it not?

See for example:

https://www.startech.com/en-jp/cables/usb31cctlkv50cm

wlesieutre•31m ago
Looks like it! Thanks for pointing this out, I had no idea it was a standard.

Apparently since 2016 https://www.usb.org/sites/default/files/documents/usb_type-c...

So for any permanent Thunderbolt GPU setups, they should really be using this type of cable

TheJoeMan•6m ago
Now that’s one way to enforce not inserting a USB upside-down.
timc3•1h ago
It’s been terrible for years/forever. Even Xserves didn’t really meet the needs of a professional data centre. And it’s got worse as a server OS because it’s not a core focus. Don’t understand why anyone tries to bother - apart from this MLX use case or as a ProRes render farm.
crote•40m ago
iOS build runner. Good luck developing cross-platform apps without a Mac!
colechristensen•40m ago
There are open source MDM projects, I'm not familiar but https://github.com/micromdm/nanohub might do the job for OS upgrades.
givemeethekeys•51m ago
Would this also work for gaming?
AndroTux•42m ago
No
storus•49m ago
Is there any way to connect DGX Sparks to this via USB4? Right now only 10GbE can be used despite both Spark and MacStudio having vastly faster options.
zackangelo•27m ago
Sparks are built for this and actually have Connect-X 7 NICs built in! You just need to get the SFPs for them. This means you can natively cluster them at 200Gbps.
wtallis•13m ago
That doesn't answer the question, which was how to get a high-speed interconnect between a Mac and a DGX Spark. The most likely solution would be a Thunderbolt PCIe enclosure and a 100Gb+ NIC, and passive DAC cables. The tricky part would be macOS drivers for said NIC.
zackangelo•5m ago
You’re right I misunderstood.

I’m not sure if it would be of much utility because this would presumably be for tensor parallel workloads. In that case you want the ranks in your cluster to be uniform or else everything will be forced to run at the speed of the slowest rank.

You could run pipeline parallel but not sure it’d be that much better than what we already have.

daft_pink•48m ago
Hoping Apple has secured plentiful DDR5 to use in their machines so we can buy M5 chips with massive amounts of RAM soon.
colechristensen•43m ago
Apple tends to book its fab time / supplier capacity years in advance
reaperducer•47m ago
As someone not involved in this space at all, is this similar to the old MacOS Xgrid?

https://en.wikipedia.org/wiki/Xgrid

wmf•5m ago
No.
reilly3000•41m ago
dang I wish I could share md tables.

Here’s a text edition: For $50k the inference hardware market forces a trade-off between capacity and throughput:

* Apple M3 Ultra Cluster ($50k): Maximizes capacity (3TB). It is the only option in this price class capable of running 3T+ parameter models (e.g., Kimi k2), albeit at low speeds (~15 t/s).

* NVIDIA RTX 6000 Workstation ($50k): Maximizes throughput (>80 t/s). It is superior for training and inference but is hard-capped at 384GB VRAM, restricting model size to <400B parameters.

To achieve both high capacity (3TB) and high throughput (>100 t/s) requires a ~$270,000 NVIDIA GH200 cluster and data center infrastructure. The Apple cluster provides 87% of that capacity for 18% of the cost.

mechagodzilla•33m ago
You can keep scaling down! I spent $2k on an old dual-socket xeon workstation with 768GB of RAM - I can run Deepseek-R1 at ~1-2 tokens/sec.
icedchai•12m ago
For $50K, you could buy 25 Framework desktop motherboards (128G VRAM each w/Strix Halo, so over 3TB total) Not sure how you'll cluster all of them but it might be fun to try. ;)
ComputerGuru•33m ago
Imagine if the Xserve was never killed off. Discontinued 14 years ago, now!
stego-tech•33m ago
This doesn’t remotely surprise me, and I can guess Apple’s AI endgame:

* They already cleared the first hurdle to adoption by shoving inference accelerators into their chip designs by default. It’s why Apple is so far ahead of their peers in local device AI compute, and will be for some time.

* I suspect this introduction isn’t just for large clusters, but also a testing ground of sorts to see where the bottlenecks lie for distributed inference in practice.

* Depending on the telemetry they get back from OSes using this feature, my suspicion is they’ll deploy some form of distributed local AI inference system that leverages their devices tied to a given iCloud account or on the LAN to perform inference against larger models, but without bogging down any individual device (or at least the primary device in use)

For the endgame, I’m picturing a dynamically sharded model across local devices that shifts how much of the model is loaded on any given device depending on utilization, essentially creating local-only inferencing for privacy and security of their end users. Throw the same engines into, say, HomePods or AppleTVs, or even a local AI box, and voila, you’re golden.

threecheese•30m ago
I think you are spot on, and this fits perfectly within my mental model of HomeKit; tasks are distributed to various devices within the network based on capabilities and authentication, and given a very fast bus Apple can scale the heck out of this.
fwip•19m ago
The bandwidth of rdma over thunderbolt is so much faster (and lower latency) than Apple's system of mostly-wireless devices, I can't see how any learnings here would transfer.
650REDHAIR•29m ago
Do we think TB4 is on the table or is there a technical limitation?
piskov•9m ago
George Hotz made nvidia running on macs with his tinygrad via usb4

https://x.com/__tinygrad__/status/1980082660920918045