frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Why does single-node DDP sometimes get slower with more GPUs?

2•traceopt-ai•1h ago
Hi,

I keep running into a frustrating issue with PyTorch DDP on a single node (2–8 GPUs): adding GPUs sometimes makes training slower (not scale proportionally), and it is hard to tell what's actually gating the step.

In practice I see:

one rank silently becomes the “worst rank” and gates every step

step time spikes where GPUs look idle, but it’s unclear if the culprit is dataloader stalls, CPU contention, batch/sequence-length imbalance, or NCCL sync

Questions for folks who run multi-GPU training:

What do you suspect first when scaling regresses on a single node?

What signals do you look at to distinguish data vs compute vs comms/sync?

Any repeatable workflow / checklist that gets you to root cause fast?

Context: I am building a small OSS tool that shows live per-rank step timing + stall attribution (always-on; not a replacement for PyTorch Profiler/Nsight). If you have a workload where DDP scaling is weird and you are willing to run a ~10-minute test, I am happy to help interpret results and prioritize support for your setup.

Repo: https://github.com/traceopt-ai/traceml

Show HN: cc-costline – See your Claude Code spend right in the statusline

https://github.com/Ventuss-OvO/cc-costline
1•ventuss_ovo•1m ago•0 comments

Convert to it – universal online file converter

https://github.com/p2r3/convert
1•exploraz•2m ago•1 comments

Mautic open source marketing automation platform faces a $50K funding shortfall

https://mautic.org/blog/urgent-call-for-community-support-to-secure-mautics-financial-future/
1•sdoering•3m ago•1 comments

Show HN: Relay – I built a modern web-based IRC/Discord replacement

https://relay.moltic.dev/
1•redmageinc•3m ago•0 comments

Former 'Morning Edition' host accuses Google of stealing his voice

https://www.npr.org/2026/02/17/nx-s1-5716055/former-morning-edition-host-accuses-google-of-steali...
1•tantalor•3m ago•0 comments

Importing ChatGPT Chats to Gemini

https://uk.pcmag.com/ai/162915/google-gemini-tests-a-tool-to-help-you-switch-from-chatgpt-other-a...
1•PrincessEe•4m ago•0 comments

Boris Cherny: How We Built Claude Code

https://www.youtube.com/watch?v=PQU9o_5rHC4
1•dhruv3006•4m ago•0 comments

Show HN: OmniFile – Universal file search for GDrive, Notion, and local files

https://omnifile.app/
1•Kazutaka_S•4m ago•0 comments

Ask HN: Is there a way to recover my Microsoft certifications?

1•soco•4m ago•0 comments

Alternatives when mainstream messengers become restricted?

https://encrogram.com
1•RussianFreedom•4m ago•1 comments

The Data on Reddit and AI Search

https://www.tryprofound.com/blog/the-data-on-reddit-and-ai-search
1•geox•5m ago•0 comments

PgDog: Connection pooler, load balancer and sharder for PostgreSQL

1•levkk•5m ago•0 comments

Show HN: Everdone CodeSecurity and CodePerformance

https://everdone.ai/
1•vinitmaniar•6m ago•0 comments

America's Pensions Can't Beat Vanguard but They Can Close Your Hospital

https://www.governance.fyi/p/americas-pensions-cant-beat-a-vanguard
1•bigbobbeeper•7m ago•0 comments

Stop Treating AI Agent Intelligence Like a Local Config File

https://medium.com/@leeon14/stop-treating-agent-intelligence-like-a-local-config-file-25659b2f2a4d
1•sileo-oss•7m ago•0 comments

Claude Code Went Berserk?

1•banrovegrie•7m ago•0 comments

Show HN: Visualizing persistent vectors in the browser using Rust and WASM

https://pvec-rs.abishov.com/web-vis/
1•araz•8m ago•1 comments

Show HN: I built an AI that roasts your LinkedIn profile

https://project-d9a49a83.doanything.app
2•sam_tilston•8m ago•0 comments

The Pillars of Agentic Security

https://sibylline.dev/articles/2026-02-15-agentic-security/
1•CuriouslyC•9m ago•0 comments

Why AI Adoption Stalls, According to Industry Data (HBR)

https://hbr.org/2026/02/why-ai-adoption-stalls-according-to-industry-data
1•swolpers•10m ago•0 comments

Show HN: Visualize sentiment of Hacker News comment threads

https://hst.experimentarea.com
2•ngregorich•10m ago•0 comments

Do you want to build a community where users search or hang? (2021)

https://www.mooreds.com/wordpress/archives/3486
1•mooreds•13m ago•0 comments

Learning. Again. and Again

https://mediations.candost.blog/p/mediations-35-learning-again-and
2•mooreds•13m ago•1 comments

Playbook: How to vibe code a successful app

1•VladCovaci•14m ago•1 comments

Show HN: Discoding – run AI CLIs locally, relay them to Discord

https://github.com/siisee11/discode
1•siisee11•14m ago•0 comments

Show HN: Threema plugin for OpenClaw – no phone number, no gateway needed

https://github.com/a1cnore/threema-openclaw
1•crazycheesu•15m ago•0 comments

What is the new etiquette for tipping?

https://text.npr.org/1196978930
1•mooreds•16m ago•0 comments

Notepad++ v8.9.2 Release – Double‑Lock Update Security

https://notepad-plus-plus.org/news/v892-released/
3•Lukas_Skywalker•17m ago•0 comments

DJI Romo bug reportedly exposed live home feeds

https://www.guru3d.com/story/dji-romo-robovac-bug-reportedly-exposed-thousands-of-live-home-feeds/
1•thatwasunusual•17m ago•0 comments

Show HN: AudioNimbus – Safe Rust Wrapper for Steam Audio

https://github.com/MaxenceMaire/audionimbus/releases/tag/0.12.0
1•mxncmr•19m ago•0 comments