frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Genus Amanita

https://www.mushroomexpert.com/amanita.html
1•rolph•49s ago•0 comments

We have broken SHA-1 in practice

https://shattered.io/
1•mooreds•1m ago•1 comments

Ask HN: Was my first management job bad, or is this what management is like?

1•Buttons840•2m ago•0 comments

Ask HN: How to Reduce Time Spent Crimping?

1•pinkmuffinere•3m ago•0 comments

KV Cache Transform Coding for Compact Storage in LLM Inference

https://arxiv.org/abs/2511.01815
1•walterbell•8m ago•0 comments

A quantitative, multimodal wearable bioelectronic device for stress assessment

https://www.nature.com/articles/s41467-025-67747-9
1•PaulHoule•10m ago•0 comments

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

https://www.wsj.com/world/india/why-big-tech-is-throwing-cash-into-india-in-quest-for-ai-supremac...
1•saikatsg•10m ago•0 comments

How to shoot yourself in the foot – 2026 edition

https://github.com/aweussom/HowToShootYourselfInTheFoot
1•aweussom•10m ago•0 comments

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents
3•archb•12m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...
1•walterbell•13m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/
1•danver0•14m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/
1•bumahkib7•14m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag
1•artigent•19m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor
3•dragandj•20m ago•0 comments

Tmux to Zellij (and Back)

https://www.mauriciopoppe.com/notes/tmux-to-zellij/
1•maurizzzio•21m ago•1 comments

Ask HN: How are you using specialized agents to accelerate your work?

1•otterley•22m ago•0 comments

Passing user_id through 6 services? OTel Baggage fixes this

https://signoz.io/blog/otel-baggage/
1•pranay01•23m ago•0 comments

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

https://davmail.sourceforge.net/
1•todsacerdoti•24m ago•0 comments

Visual data modelling in the browser (open source)

https://github.com/sqlmodel/sqlmodel
1•Sean766•26m ago•0 comments

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

https://github.com/chinonsochikelue/tharos
1•fluantix•26m ago•0 comments

Oddly Simple GUI Programs

https://simonsafar.com/2024/win32_lights/
1•MaximilianEmel•27m ago•0 comments

The New Playbook for Leaders [pdf]

https://www.ibli.com/IBLI%20OnePagers%20The%20Plays%20Summarized.pdf
1•mooreds•27m ago•1 comments

Interactive Unboxing of J Dilla's Donuts

https://donuts20.vercel.app
1•sngahane•29m ago•0 comments

OneCourt helps blind and low-vision fans to track Super Bowl live

https://www.dezeen.com/2026/02/06/onecourt-tactile-device-super-bowl-blind-low-vision-fans/
1•gaws•30m ago•0 comments

Rudolf Vrba

https://en.wikipedia.org/wiki/Rudolf_Vrba
1•mooreds•31m ago•0 comments

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

https://www.medpagetoday.com/neurology/autism/119747
1•paulpauper•32m ago•0 comments

Wellness Hotels Discovery Application

https://aurio.place/
1•cherrylinedev•32m ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...
1•mooreds•33m ago•0 comments

Sebastian Galiani on the Marginal Revolution

https://marginalrevolution.com/marginalrevolution/2026/02/sebastian-galiani-on-the-marginal-revol...
2•paulpauper•36m ago•0 comments

Ask HN: Are we at the point where software can improve itself?

1•ManuelKiessling•36m ago•2 comments
Open in hackernews

Predict your distributed LLM training time before you burn GPU hours

https://github.com/DebarghaG/estimate-train-time
2•barthelomew•2w ago

Comments

barthelomew•2w ago
Predict your distributed LLM training time before you burn GPU hours.

We've open-sourced a tool (https://github.com/DebarghaG/estimate-train-time) that estimates wall-clock time for LLM training across multi-GPU setups with 3D parallelism (pipeline, tensor, and data).

This problem is extremely hard: you're modeling the interplay of thousands of GPU kernels, NCCL collectives across heterogeneous network topologies, pipeline bubbles, activation recomputation, and ZeRO optimizer communication all while these components interact in non-obvious ways at scale. Even off-by-2x estimates are useless for capacity planning.

Two years of painstaking work, ~$100k worth of cluster time, validated on real workloads at Perlmutter (NERSC) and Vista (TACC) some of the largest HPC clusters available for open science.

How it works: 1. Kernel-level profiling: We sample execution times for kernels like Flash Attention, fused GEMM (QKV/FFN projections), RMSNorm, embedding lookups, and cross-entropy loss across the (batch, seq_len, hidden_dim, num_heads, MP degree) parameter space. 2. Communication modeling: NCCL benchmarks capture ring all-reduce (tensor/data parallel sync), all-gather (ZeRO-1 parameter collection), and P2P send/recv (pipeline stage activation transfers) across intra-node NVLink and inter-node InfiniBand topologies. 3. Analytical composition: Operator predictions feed into a pipeline scheduling model (AF-AB / 1F1B) that accounts for bubble overhead: (PP - 1) / (num_microbatches + PP - 1) idle fraction, layer distribution across head/middle/tail stages, and overlapped DP gradient sync. 4. Runs on CPU (post-sampling) no GPU access needed for inference of training time.

This is highly extensible as a recipe. You may profile your own hardware with bundled kernel-sampling and NCCL-benchmarking scripts. You can add custom operators by implementing the regressor interface.

This work builds on our HiPC 2025 paper on fine-grained GPU performance modeling. Earlier code to reproduce results in paper: https://github.com/ICICLE-ai/distributed_training_estimator_...

Looking for early adopters and feedback especially teams doing parallelism strategy search or capacity planning at scale.