frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

PyTorch Monarch

https://pytorch.org/blog/introducing-pytorch-monarch/
377•jarbus•3mo ago

Comments

pjmlp•3mo ago
Apparently PyTorch oxidation has started.

> Monarch is split into a Python-based frontend, and a backend implemented in Rust.

Other than that, looks like a quite interesting project.

galangalalgol•3mo ago
This is a new project right? Not the oxidation of an existing one.
gaogao•3mo ago
Yup, hyperreactor, one of the new crates that's part of it, does some particularly interesting things for efficient parallel distributed channels.
dhrt12327•3mo ago
Multiple sources say that it is an experimental framework around PyTorch, not a replacement. People will still get to enjoy a circular graph using std::shared_ptr with memory leaks.

It's a pity they don't do a complete rewrite with a functional language as the driver.

gaogao•3mo ago
> It's a pity they don't do a complete rewrite with a functional language as the driver.

It's open source, so seeing such an extension would be quite cool. There's much that could be done with native Rust actors and code that get maybe at what you want, but nothing precludes mixing PyTorch and other backends.

For example, you could wrap a C++ inference engine as part of one of the actors generating data for other actors doing distributed training.

pjmlp•3mo ago
Interesting, by the way, you can replicate the experience in Rust.
hansvm•3mo ago
Arc<T> has entered the chat.
bullfightonmars•3mo ago
You might be looking for elixir/nx and axon

https://github.com/elixir-nx/axon

jonapro•3mo ago
Beowulf then.
valzam•3mo ago
I assume this is similar to Ray?
lairv•3mo ago
I'm also curious what's the use case of this over Ray. Tighter integration with PyTorch/tensors abstractions?
porridgeraisin•3mo ago
That.

Also, it has RDMA. Last I checked, Ray did not support RDMA.

There are probably other differences as well, but the lack of RDMA immediately splits the world into things you can do with ray and things you cannot do with ray

zacmps•3mo ago
Not currently, but it is being worked on https://github.com/ray-project/ray/issues/53976.
disattention•3mo ago
I had the same thought, especially because of their recent collaboration.

https://pytorch.org/blog/pytorch-foundation-welcomes-ray-to-...

unnah•3mo ago
There's also Dask, which can do distributed pandas and numpy operations etc. However it was originally developed for traditional HPC systems and has only limited support for GPU computing. https://www.dask.org/
cwp•3mo ago
The code example is very similar to Ray.

Monarch:

  class Example(Actor):
     @endpoint
     def say_hello(self, txt):
         return f"hello {txt}"

  procs = this_host().spawn_procs({"gpus": 8})
  actors = procs.spawn("actors", Example)
  hello_future = actors.say_hello.call("world")
  hello_future.get()
Ray:

  @ray.remote(num_gpus=1)
  class Example:
      def say_hello(self, txt):
          return f"hello {txt}"

  actors = [Example.remote() for _ in range(8)]
  hello_object_refs = [a.say_hello.remote("world") for a in actors]
  ray.get(hello_object_refs)
milancurcic•3mo ago
Cool! Essentially Fortran coarrays from 2008.
philipallstar•3mo ago
Or Hadoop from 2006? But you don't need to write MapReduce or Fortran, so it's probably far nicer.
pjmlp•3mo ago
Fortran 2023 is already quite nice, and doesn't need to rewrite stuff in C for performance.
alyxya•3mo ago
I made my own single controller PyTorch extension [1], though mines doesn't yet support cross node communication. I found it interesting to compare how Monarch makes things performant. I believe Monarch also uses cloudpickle for code to be shared among all nodes, which is probably the only way to performantly have various nodes execute work as that ends up being a one time setup cost. I found the fanning out of sending messages from the single controller to be really interesting, so the controller is unlikely to be the bottleneck besides any synchronous operations.

As far as things that might be a performance loss here, one thing I'm wondering is if custom kernels are supported. I'm also wondering how much granularity of control there is with communication between different actors calling a function. Overall, I really like this project and hope to see it used over multi-controller setups.

[1] https://github.com/alyxya/mycelya-torch

gaogao•3mo ago
> As far as things that might be a performance loss here, one thing I'm wondering is if custom kernels are supported

Yeah, you might end up needing some changes to remote worker initialization, but you can generally bake in whatever kernels and other system code you need.

logicchains•3mo ago
This seems strictly less powerful than Jax, which comes with a powerful compiler that optimises how cross-node communication is conducted.
gaogao•3mo ago
Nah, focusing on a different controller paradigm. Jax is focused on multi-controller SPMD, while this is focused on a single-controller setup. Both have their place, with single-controller being generally easier to reason about, and multi-controller more optimal for certain dataflows. There's also some interesting mixes of the two control paradigms.
nothrowaways•3mo ago
FB should create a pytorch foundation and set it free before they fuck it up.
gooodvibes•3mo ago
https://pytorch.org/foundation/
dkdcio•3mo ago
damn that was fast!
porridgeraisin•3mo ago
> This lets us avoid single-host bottlenecks, effectively using the whole mesh as a distributed cluster for message forwarding. (Cite scalability numbers here.)

In case someone that can fix this is reading here

chandureddyvari•3mo ago
Interesting - this seems to target a different layer than services like Tinker (https://thinkingmachines.ai/blog/announcing-tinker/). Monarch provides the infrastructure primitives while Tinker is a managed finetuning service. Could someone build something like Tinker on top of Monarch?
gaogao•3mo ago
Yup, there's stuff like https://pytorch.org/blog/introducing-torchforge/ on top of it now
chandureddyvari•3mo ago
Nice, so the open source equivalent now exists. Meta basically commoditized Tinker's($12B valuation) value prop by giving away the infra (Monarch) and the RL framework (TorchForge). Will be interesting to see how a managed service competes with free + open source at this layer.
pstoll•3mo ago
“Service Adverbs - like ‘route’ and ‘fanout’”

Grammarians are going to be big angry here. Ain’t an adverb in sight.

SomaticPirate•3mo ago
"Our Rust-based backend facilitates our performance, scale, and robustness — we amply use Rust’s fearless concurrency in Monarch’s implementation"

Found a few typo's. The em dash makes me suspect an LLM was involved in proofreading

alt187•3mo ago
https://www.scottsmitelli.com/articles/em-dash-tool/
geedzmo•3mo ago
That was a really good read. Glad I clicked
alt187•3mo ago
It's not even one of the funniest pieces of the author, and that says a lot.
whimsicalism•3mo ago
that it is surrounded by spaces makes this less likely
ComputerGuru•3mo ago
Most style guides would call that an error, em dash should be used without surrounding spaces (while an en dash requires them). The only publication I know that has (recently?) eschewed that advice is WaPo. If the idea was to make it more visible, I believe the correct solution would have been for WaPo to use an en dash but render it longer in their typeface.
whimsicalism•3mo ago
yes, i agree with you and this is how i used to use emdashes. chatgpt also agrees with you, which is why spaces are a pretty good indicator that it's not an LLM
hellohello2•3mo ago
I would argue that typos suggest an LLM did not proofread.
fadedsignal•3mo ago
It is a nice project. I have questions.

- Is this similar to openMPI?

- How is a mesh established? Do they need to be on the same host?

semessier•3mo ago
this could become a major thing in coarray world, but the issues start already:

> ...Note that this does not support tensor engine, which is tied to CUDA and RDMA (via ibverbs).

I.e. yet another CUDA married approach: the issue is not ibverbs but the code shows they use GPUDirect RDMA, going from there this can only get worse - more CUDA dependencies. There would have been OpenUCX.

bjourne•3mo ago
> Monarch lets you program distributed systems the way you’d program a single machine, hiding the complexity of distributed computing:

There are some infamous tech based on the "hiding" paradigm. PHP comes to mind. By hiding how the http request/response cycle actually works it fostered a generation of web developers who didn't know what a session cookie was, resulting in login systems that leaked like a sieve. Distributed computing is complicated. There are many parameters you need to tweak and many design decisions you need to take to make distributed model training run smoothly. I think explicit and transparent architectures are way better. Distributed model training shouldn't "feel" like running on a single device because it isn't.

A quantitative, multimodal wearable bioelectronic device for stress assessment

https://www.nature.com/articles/s41467-025-67747-9
1•PaulHoule•57s ago•0 comments

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

https://www.wsj.com/world/india/why-big-tech-is-throwing-cash-into-india-in-quest-for-ai-supremac...
1•saikatsg•1m ago•0 comments

How to shoot yourself in the foot – 2026 edition

https://github.com/aweussom/HowToShootYourselfInTheFoot
1•aweussom•1m ago•0 comments

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents
1•archb•3m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...
1•walterbell•3m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/
1•danver0•4m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/
1•bumahkib7•4m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag
1•artigent•10m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor
2•dragandj•11m ago•0 comments

Tmux to Zellij (and Back)

https://www.mauriciopoppe.com/notes/tmux-to-zellij/
1•maurizzzio•12m ago•1 comments

Ask HN: How are you using specialized agents to accelerate your work?

1•otterley•13m ago•0 comments

Passing user_id through 6 services? OTel Baggage fixes this

https://signoz.io/blog/otel-baggage/
1•pranay01•14m ago•0 comments

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

https://davmail.sourceforge.net/
1•todsacerdoti•14m ago•0 comments

Visual data modelling in the browser (open source)

https://github.com/sqlmodel/sqlmodel
1•Sean766•17m ago•0 comments

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

https://github.com/chinonsochikelue/tharos
1•fluantix•17m ago•0 comments

Oddly Simple GUI Programs

https://simonsafar.com/2024/win32_lights/
1•MaximilianEmel•17m ago•0 comments

The New Playbook for Leaders [pdf]

https://www.ibli.com/IBLI%20OnePagers%20The%20Plays%20Summarized.pdf
1•mooreds•18m ago•0 comments

Interactive Unboxing of J Dilla's Donuts

https://donuts20.vercel.app
1•sngahane•19m ago•0 comments

OneCourt helps blind and low-vision fans to track Super Bowl live

https://www.dezeen.com/2026/02/06/onecourt-tactile-device-super-bowl-blind-low-vision-fans/
1•gaws•21m ago•0 comments

Rudolf Vrba

https://en.wikipedia.org/wiki/Rudolf_Vrba
1•mooreds•21m ago•0 comments

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

https://www.medpagetoday.com/neurology/autism/119747
1•paulpauper•22m ago•0 comments

Wellness Hotels Discovery Application

https://aurio.place/
1•cherrylinedev•23m ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...
1•mooreds•24m ago•0 comments

Sebastian Galiani on the Marginal Revolution

https://marginalrevolution.com/marginalrevolution/2026/02/sebastian-galiani-on-the-marginal-revol...
2•paulpauper•27m ago•0 comments

Ask HN: Are we at the point where software can improve itself?

1•ManuelKiessling•27m ago•1 comments

Binance Gives Trump Family's Crypto Firm a Leg Up

https://www.nytimes.com/2026/02/07/business/binance-trump-crypto.html
1•paulpauper•27m ago•1 comments

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

https://old.reddit.com/r/ClaudeCode/comments/1qy5l0n/reverse_engineering_chinese_shitprogram_for/
1•edward•27m ago•0 comments

Indian Culture

https://indianculture.gov.in/
1•saikatsg•30m ago•0 comments

Show HN: Maravel-Framework 10.61 prevents circular dependency

https://marius-ciclistu.medium.com/maravel-framework-10-61-0-prevents-circular-dependency-cdb5d25...
1•marius-ciclistu•31m ago•0 comments

The age of a treacherous, falling dollar

https://www.economist.com/leaders/2026/02/05/the-age-of-a-treacherous-falling-dollar
2•stopbulying•31m ago•0 comments