frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Predict your distributed LLM training time before you burn GPU hours

https://github.com/DebarghaG/estimate-train-time
2•barthelomew•3h ago

Comments

barthelomew•3h ago
Predict your distributed LLM training time before you burn GPU hours.

We've open-sourced a tool (https://github.com/DebarghaG/estimate-train-time) that estimates wall-clock time for LLM training across multi-GPU setups with 3D parallelism (pipeline, tensor, and data).

This problem is extremely hard: you're modeling the interplay of thousands of GPU kernels, NCCL collectives across heterogeneous network topologies, pipeline bubbles, activation recomputation, and ZeRO optimizer communication all while these components interact in non-obvious ways at scale. Even off-by-2x estimates are useless for capacity planning.

Two years of painstaking work, ~$100k worth of cluster time, validated on real workloads at Perlmutter (NERSC) and Vista (TACC) some of the largest HPC clusters available for open science.

How it works: 1. Kernel-level profiling: We sample execution times for kernels like Flash Attention, fused GEMM (QKV/FFN projections), RMSNorm, embedding lookups, and cross-entropy loss across the (batch, seq_len, hidden_dim, num_heads, MP degree) parameter space. 2. Communication modeling: NCCL benchmarks capture ring all-reduce (tensor/data parallel sync), all-gather (ZeRO-1 parameter collection), and P2P send/recv (pipeline stage activation transfers) across intra-node NVLink and inter-node InfiniBand topologies. 3. Analytical composition: Operator predictions feed into a pipeline scheduling model (AF-AB / 1F1B) that accounts for bubble overhead: (PP - 1) / (num_microbatches + PP - 1) idle fraction, layer distribution across head/middle/tail stages, and overlapped DP gradient sync. 4. Runs on CPU (post-sampling) no GPU access needed for inference of training time.

This is highly extensible as a recipe. You may profile your own hardware with bundled kernel-sampling and NCCL-benchmarking scripts. You can add custom operators by implementing the regressor interface.

This work builds on our HiPC 2025 paper on fine-grained GPU performance modeling. Earlier code to reproduce results in paper: https://github.com/ICICLE-ai/distributed_training_estimator_...

Looking for early adopters and feedback especially teams doing parallelism strategy search or capacity planning at scale.

Post-Agentic Code Forges

https://sluongng.substack.com/p/post-agentic-code-forges
1•todsacerdoti•23s ago•0 comments

In-memory analog computing for non-negative matrix factorization

https://www.nature.com/articles/s41467-026-68609-8
1•martinlaz•5m ago•0 comments

RT Superconductivity at 298K in Ternary LaScH System at High-Pressure Conditions

https://arxiv.org/abs/2510.01273
1•fluffybuns•7m ago•0 comments

Show HN: Waifu2x.live – Free AI image upscaler (2x/4x) & video generation

1•Nancy1230•7m ago•1 comments

Campaigner launches £1.5B legal action in UK against Apple over wallet's

https://www.theguardian.com/technology/2026/jan/23/campaigner-launches-legal-action-against-apple...
1•chrisjj•9m ago•0 comments

Anthropic: AI Is Transforming Jobs, Not Replacing Them

https://www.forbes.com/sites/anishasircar/2026/01/23/ai-is-transforming-jobs-not-replacing-them-a...
1•hochmartinez•10m ago•0 comments

AI Boosts Research Careers but Flattens Scientific Discovery

https://spectrum.ieee.org/ai-science-research-flattens-discovery
1•pseudolus•10m ago•0 comments

Google must face consumer antitrust lawsuit over search dominance,US judge rules

https://www.reuters.com/legal/government/google-must-face-consumer-antitrust-lawsuit-over-search-...
2•pseudolus•11m ago•0 comments

Do We Still Need Tech Blogs in the Era of GenAI?

https://blog.mrcroxx.com/posts/do-we-still-need-tech-blogs-in-the-era-of-gen-ai/
1•MrCroxx•12m ago•0 comments

Show HN: Simple esp-idf and esp-matter version manager

https://github.com/matterizelabs/espvm
1•abu-matterize•13m ago•0 comments

Booting a PC from a Vinyl Record

https://boginjr.com/it/sw/dev/vinyl-boot/
1•yesturi•13m ago•0 comments

Show HN: Kite – lightweight production-ready agentic AI framework with Ollama

https://github.com/thienzz/Kite
1•thienzz•16m ago•1 comments

Resisting the Rule of the Rich: Protecting Freedom from Billionaire Power

https://www.oxfamamerica.org/explore/research-publications/resisting-the-rule-of-the-rich/
1•decimalenough•16m ago•0 comments

Show HN: OPC Skills – 9 AI agent skills for solopreneurs (Claude Code, Cursor)

https://opc.dev/
1•Zephyr0x•16m ago•0 comments

Sonic Booms and Seismic Waves Can Reveal Where Space Junk Crash-Lands

https://www.nytimes.com/2026/01/22/science/space-junk-seismographs.html
1•_____k•20m ago•0 comments

The Rise and Impending Fall of the Dental Cavity

https://www.cremieux.xyz/p/the-rise-and-impending-fall-of-the
1•MrBuddyCasino•21m ago•0 comments

China's analogue AI chip runs 12x as fast on 1/200 the energy of digital rivals

https://www.scmp.com/news/china/science/article/3340939/chinas-analogue-ai-chip-runs-12-times-fas...
2•martinlaz•24m ago•0 comments

Who use chatbots for news consider them unbiased and "good enough"

https://www.niemanlab.org/2026/01/people-who-use-chatbots-for-news-consider-them-unbiased-and-goo...
1•giuliomagnifico•26m ago•0 comments

Indoor Mapping – OpenStreetMap Wiki

https://wiki.openstreetmap.org/wiki/Indoor_Mapping
4•marklit•28m ago•1 comments

Post-Quantum Cryptography

https://en.wikipedia.org/wiki/Post-quantum_cryptography
1•aggrrrh•30m ago•0 comments

Testing AI orchestrated cyber attacks in practice

https://blog.fraktal.fi/testing-ai-orchestrated-attacks-in-practice-12f8fb03191e
1•tmakkonen•39m ago•0 comments

Downloading a Podcast to Create an Audiobook

https://kevinboone.me/clh_podcast_to_audiobook.html
1•LaSombra•40m ago•0 comments

Why I Don't Have Fun With Claude Code

https://brennan.io/2026/01/23/claude-code/
3•ingve•40m ago•0 comments

Why digital signatures break on structured healthcare data

https://formidable.care/articles/understanding-the-identity-integrity-gap-in-digital-signing
1•vincentxplore•43m ago•0 comments

Roleplayers

1•shoman3003•43m ago•0 comments

Faster Loading for GitHub Issues

https://github.blog/changelog/2026-01-22-faster-loading-for-github-issues/
2•ramon156•45m ago•0 comments

Web-SQLite-JS allows for the persistence of relational data on web clients [video]

https://www.youtube.com/watch?v=ZHYDv4GPprU
1•wuchuheng•49m ago•0 comments

Ask HN: Which paid apps and services do you use?

3•chistev•53m ago•0 comments

SnapHabit : Extreme habit accountability with AI and friend groups

https://snap-habit.com/
1•apollos•54m ago•0 comments

E-scooter sharing company Bird has raised $20M

https://micromobility.io/news/birds-parent-company-third-lane-mobility-raises-20m
1•prabinjoel•55m ago•2 comments