frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Fault Tolerant Llama training

https://pytorch.org/blog/fault-tolerant-llama-training-with-2000-synthetic-failures-every-15-seconds-and-no-checkpoints-on-crusoe-l40s/
45•Mougatine•3d ago

Comments

d4l3k•6h ago
Hey, nice to see this here!

I'm the primary author so happy to answer any questions you might have!

timzaman•6h ago
300 L40s? What's this, 1998?
kcorbitt•6h ago
I was curious about this so I had o3 do a bit of research. Turns out 300 L40s have more compute than any supercomputer before 2013 (and arguably before 2016, depending on how you count reduced-precision FLOPs).

https://chatgpt.com/share/685dea79-26ec-8002-bd62-7ed83aedf4...

d4l3k•5h ago
Hey Tim, how's it going?

Interested in lending PyTorch some compute? :)

torchft can handle much larger scales but for public multi-day demonstration run this is what we had available. Point of this blog was to demonstrate correctness of the quorum algorithm and recovery with a stock PyTorch stack and not so much peak flops.

Stay tuned though -- planning on doing some much larger demos on B200s!

bjt12345•5h ago
This is severely underrated work, why aren't there more mid sized companies helping this? Ultra Ethernet just got released.
foobiekr•50m ago
Ultra Ethernet will do almost nothing. It’s a rubber stamped version of Broadcom’s design and Marcel/Cisco/etc will just add it to their asics. Remains to be seen if SpecrumX will or Connectix. If not, none of it matters.

These chips are $30m-$100m projects a pop. After the embarrassingly brutal failure of Barefoot nobody is going to do ASICs.

zxexz•41m ago
This is awesome, can’t wait to try out these techniques. At least a week a year of my time for the past few years has gone towards recovering from a fault crashing a training run. Sometimes environment related, sometimes shared storage, sometimes just because a slightly faulty IB cable.

Biomolecular shifts occur in our 40s and 60s (2024)

https://med.stanford.edu/news/all-news/2024/08/massive-biomolecular-shifts-occur-in-our-40s-and-60s--stanford-m.html
109•fzliu•3h ago•28 comments

XSLT

https://github.com/pacocoursey/xslt
57•_kush•1h ago•21 comments

Show HN: Sick of emailing yourself stuff? me too

https://github.com/sirbread/sink
13•sirbread•52m ago•8 comments

AlphaGenome: AI for better understanding the genome

https://deepmind.google/discover/blog/alphagenome-ai-for-better-understanding-the-genome/
433•i_love_limes•16h ago•136 comments

A lumberjack created more than 200 sculptures in Wisconsin's Northwoods

https://www.smithsonianmag.com/travel/when-a-lumberjacks-imagination-ran-wild-he-created-more-than-200-sculptures-in-wisconsins-northwoods-180986840/
46•noleary•4h ago•17 comments

Launch HN: Issen (YC F24) – Personal AI language tutor

248•mariano54•16h ago•218 comments

Sailing the fjords like the Vikings yields unexpected insights

https://arstechnica.com/science/2025/06/this-archaeologist-built-a-replica-boat-to-sail-like-the-vikings/
27•pseudolus•3d ago•3 comments

Alternative Layout System

https://alternativelayoutsystem.com/scripts/#same-sizer
209•smartmic•11h ago•26 comments

Bogong moths use a stellar compass for long-distance navigation at night

https://www.nature.com/articles/s41586-025-09135-3
12•Anon84•3d ago•0 comments

The time is right for a DOM templating API

https://justinfagnani.com/2025/06/26/the-time-is-right-for-a-dom-templating-api/
130•mdhb•11h ago•84 comments

How much slower is random access, really?

https://samestep.com/blog/random-access/
70•sestep•3d ago•27 comments

Kea 3.0, our first LTS version

https://www.isc.org/blogs/kea-3-0/
79•conductor•10h ago•26 comments

Collections: Nitpicking Gladiator's Iconic Opening Battle, Part I

https://acoup.blog/2025/06/06/collections-nitpicking-gladiators-iconic-opening-battle-part-i/
35•diodorus•3d ago•10 comments

Show HN: Magnitude – Open-source AI browser automation framework

https://github.com/magnitudedev/magnitude
90•anerli•12h ago•31 comments

Denmark to tackle deepfakes by giving people copyright to their own features

https://www.theguardian.com/technology/2025/jun/27/deepfakes-denmark-copyright-law-artificial-intelligence
24•tfourb•2h ago•8 comments

Snow - Classic Macintosh emulator

https://snowemu.com/
230•ColinWright•21h ago•79 comments

Apple Research unearthed forgotten AI technique and using it to generate images

https://9to5mac.com/2025/06/23/apple-ai-image-model-research-tarflow-starflow/
92•celias•3d ago•31 comments

Uv and Ray: Pain-Free Python Dependencies in Clusters

https://www.anyscale.com/blog/uv-ray-pain-free-python-dependencies-in-clusters
5•robertnishihara•39m ago•0 comments

Typr – TUI typing test with a word selection algorithm inspired by keybr

https://github.com/Sakura-sx/typr
68•Sakura-sx•3d ago•31 comments

Judge rejects Meta's claim that torrenting is “irrelevant” in AI copyright case

https://arstechnica.com/tech-policy/2025/06/judge-rejects-metas-claim-that-torrenting-is-irrelevant-in-ai-copyright-case/
35•Bluestein•3h ago•14 comments

A Review of Aerospike Nozzles: Current Trends in Aerospace Applications

https://www.mdpi.com/2226-4310/12/6/519
75•PaulHoule•15h ago•40 comments

Introducing Gemma 3n

https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/
333•bundie•13h ago•144 comments

Dickinson's Dresses on the Moon

https://www.theparisreview.org/blog/2025/06/20/dickinsons-dresses-on-the-moon/
17•Bluestein•3d ago•0 comments

SigNoz (YC W21, Open Source Datadog) Is Hiring DevRel Engineers (Remote)(US)

https://www.ycombinator.com/companies/signoz/jobs/cPaxcxt-devrel-engineer-remote-us-time-zones
1•pranay01•12h ago

Show HN: I built an AI dataset generator

https://github.com/metabase/dataset-generator
134•matthewhefferon•15h ago•27 comments

Fault Tolerant Llama training

https://pytorch.org/blog/fault-tolerant-llama-training-with-2000-synthetic-failures-every-15-seconds-and-no-checkpoints-on-crusoe-l40s/
45•Mougatine•3d ago•7 comments

'Peak flower power era': The story of first ever Glastonbury Festival in 1970

https://www.bbc.com/culture/article/20250620-the-story-of-the-first-ever-glastonbury-festival-in-1970
12•keepamovin•3d ago•1 comments

Matrix v1.15

https://matrix.org/blog/2025/06/26/matrix-v1.15-release/
166•todsacerdoti•10h ago•57 comments

Show HN: PRSS Site Creator – Create Blogs and Websites from Your Desktop

https://prss.co/
17•volted•9h ago•7 comments

Apple Just Patented an Image Sensor with 20 Stops of Dynamic Range

https://ymcinema.com/2025/06/25/apple-just-patented-an-image-sensor-with-20-stops-of-dynamic-range/
3•consumer451•48m ago•1 comments