frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Is LLM training infra still broken enough to build a company around?

2•harsh020•1h ago
We recently ran into something frustrating while training and fine-tuning open-weight TTS models.

Instead of working on the model itself, we spent days dealing with: - CUDA version mismatches - Driver / PyTorch conflicts - OOM crashes when scaling to multi-GPU - Broken or outdated open-source training scripts - Gluing together tracking + eval + deployment manually

It felt like we were rebuilding the same orchestration layer every team probably rebuilds. - Cloud providers give raw GPUs. - MLOps tools give experiment tracking. - Open-source gives training scripts.

But the end-to-end workflow (dataset → fine-tune → monitor → evaluate → deploy → retrain) still feels stitched together.

We’re exploring building an opinionated platform that:

Lets you select a base model (e.g. Llama/Mistral-style open models) 1. Upload or connect datasets 2. Choose infra tier 3. Launch LoRA/full fine-tuning 4. Monitor loss + cost in real time 5. Run built-in eval 6. Deploy with one click

Basically: abstract away the CUDA + orchestration layer.

Before we go too deep, I’d love honest feedback: - Is this still a painful problem at your company? - Would serious AI teams use this, or do larger companies just build infra in-house? - Is this doomed to be a hobbyist tool? - Where would the real wedge be — training, evaluation, or continuous retraining?

We’ve launched a simple landing page and started building, but we’re still early and trying to validate whether this is a real infra gap or just our own frustration.

Would appreciate blunt feedback.

Comments

genxy•1h ago
> CUDA version mismatches - Driver / PyTorch conflicts - OOM crashes when scaling to multi-GPU - Broken or outdated open-source training scripts - Gluing together tracking + eval + deployment manually

This shouldn't take days and CC can already setup all of this using whatever level of rigor you need.

Your business will get replaced with a prompt.

GitHub Actions is left vulnerable to supply chain attacks: Datadog Report

https://www.datadoghq.com/state-of-devsecops/
1•varunsharma07•17s ago•0 comments

Google Killed the Rent-a-Domain Era

https://growtika.com/blog/publisher-affiliate-collapse
1•Growtika•19s ago•0 comments

Show HN: Karta – Google Search, for discovering talent

https://www.karta.works
1•kidustiliksew•35s ago•0 comments

Smallest transformer that can add two 10-digit numbers

https://github.com/anadim/AdderBoard
1•ks2048•55s ago•0 comments

A Visual Guide to DNA Sequencing

https://www.asimov.press/p/dna-sequencing
1•mailyk•1m ago•0 comments

He saw an abandoned trailer. Then, uncovered a surveillance network

https://calmatters.org/justice/2026/02/alpr-border-patrol-caltrans/
4•Element_•3m ago•0 comments

Show HN: I built a local AI-powered Ouija board with a fine-tuned 3B model

https://github.com/SurceBeats/Planchette
2•SurceBeats•4m ago•1 comments

Using AI without losing skills

https://manafov.co/posts/using-ai-without-losing-skills
2•airbridgeflyer•5m ago•0 comments

Hyper: a reactive server side rendered web framework for Clojure

https://github.com/dynamic-alpha/hyper
2•rschmukler•6m ago•1 comments

Trump, seeking executive power over elections, is urged to declare emergency

https://www.washingtonpost.com/politics/2026/02/26/trump-elections-executive-order-activists/
2•martialg•6m ago•0 comments

TikTok, X link organiser for iOS and Android

https://saveforlater.pro
2•aria-sfl•8m ago•0 comments

Towards a Sovereign Mobile Stack

https://modal.cx/blog/sovereign-mobile-stack/
3•tomwas54•9m ago•0 comments

Show HN: Protection Against Zero-Day Cyber Attacks

2•gaurav1086•9m ago•0 comments

Anthropic is giving Claude Opus 3 its own Substack

https://substack.com/home/post/p-189177740
2•luispa•9m ago•0 comments

4Chan knew about Jeffrey Epstein's death 38 minutes before the rest of the world

https://www.businessinsider.com/epstein-files-show-fbi-probed-4chan-posts-prison-death-2026-2
3•cwwc•10m ago•0 comments

Ask HN: How are you handling EU AI Act compliance as a developer?

1•gibs-dev•10m ago•0 comments

Microsoft announces new "mini PCs" for Windows 365

https://www.neowin.net/news/microsoft-announces-new-mini-pcs-for-windows-365/
2•mikece•11m ago•0 comments

Stellify – Structured code for AI-assisted development

https://stellisoft.com
2•Stellisoft•11m ago•1 comments

Study finds that pay differences among top performers can erode cooperation

https://phys.org/news/2026-02-nba-pay-differences-erode-cooperation.html
1•PaulHoule•11m ago•0 comments

Show HN: AppLaunchFlow: Create App Store screenshots in minutes

https://www.applaunchflow.com/
1•ynnickw•11m ago•0 comments

Why does the Hacker News UI never get updated?

1•lasgawe•13m ago•2 comments

iPhone and iPad approved to handle classified NATO information

https://www.apple.com/newsroom/2026/02/iphone-and-ipad-approved-to-handle-classified-nato-informa...
14•throwfaraway4•14m ago•2 comments

Show HN: EloPhanto – Video creation, 116 tools

https://github.com/elophanto/EloPhanto
1•elophanto_agent•14m ago•0 comments

Linux, Product and the Art of Essence

https://www.dedoimedo.com/computers/linux-product-philosophy.html
1•dxs•14m ago•0 comments

Alberta Learner Test – The Basics of Driving

https://driversedhub.com/tests/alberta-learner-test-the-basics-of-driving/
1•mcinemet•16m ago•0 comments

rlwrap (2013)

https://web.archive.org/web/20160304015410/http://ithaca.arpinum.org/2013/01/20/rlwrap.html
2•tosh•17m ago•0 comments

The C++ Development Crisis

1•ClairesBrother•17m ago•2 comments

Tasklet Instant Apps

https://tasklet.ai/release-notes#instant-apps
2•rockwotj•17m ago•1 comments

What Claude Code Chooses

https://amplifying.ai/research/claude-code-picks
3•tin7in•18m ago•0 comments

Twitch: "Hey, come back! This commercial break can't play while you're away."

https://twitter.com/KryDotExe/status/2026806591517856208
25•josephcsible•18m ago•9 comments