frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: I Am 15 and Built a Dual Backend MLP from Scratch Using CUDA C++

https://github.com/muchlakshay/Dual-Backend-MLP-From-Scratch-CUDA
1•muchlakshay•7h ago
hii everyone! I'm a 15-year-old and I just completed a dual backend MLP from scratch that supports both CPU and GPU (CUDA) training.

for the CPU backend, I used only Eigen for linear algebra, nothing else.

for the GPU backend, I implemented my own custom matrix library in CUDA C++. The CUDA kernels aren’t optimized with shared memory, tiling, or fused ops (so there’s some kernel launch overhead), but I chose clarity, modularity, and reusability over a few milliseconds of speedup.

that said, I've taken care to ensure coalesced memory access, and it gives pretty solid performance, around 0.4 ms per epoch on MNIST (batch size = 1000) using an RTX 3060.

This project is a big step up from my previous one. It's cleaner, well-documented, and more modular.

I’m fully aware of areas that can be improved, and I’ll be working on them in future projects. My long-term goal is to get into Harvard or MIT, and this is part of that journey.

would love to hear your thoughts, suggestions, or feedback

ive attached the link to my GitHub Repo

Comments

onelli•7h ago
Love seeing young devs shipping real projects! Out of curiosity, have you tried benchmarking your MLP on any real-world data sets, or was this mainly about learning CUDA/C++? (And what’s the biggest gotcha you ran into?)
muchlakshay•6h ago
thanks!!!! appreciate that a lot. i’ve mainly tested it on MNIST for now, the CUDA backend trains one epoch in ~0.4ms (batch size 1000, RTX 3060, as i mentioned in the post). It was primarily a deep dive into CUDA/C++, manual memory management, and building a dual backend architecture with a custom matrix lib (GPU-side completely from scratch). this was actually my 4th serious attempt at building a GPU-based MLP from scratch. I failed multiple times, sometimes due to a single line of code. in earlier attempts, i had this optimization idea: store both the weights and their transposes in GPU memory, so i wouldn’t have to compute the transpose each epoch. Seemed clever, until training started failing badly. Turned out I was only updating the original weights matrix after backprop, but the transposed one was still holding stale values from earlier. this broke training completely, and I spent weeks trying to debug it, couldn’t figure it out until this current version.

honestly, the biggest gotchas were-

-memory coherence issues like above (esp. when trying to cache 'smartly')

-launching kernels in the right order while keeping data in sync

-maintainingg modularity without sacrificing too much performance

i avoided fused kernels/shared memory in this version to keep things clean and reusable, but now that the core works, I plan to start optimizing that layer too.

Show HN: Self-updating MCP server for official pip, uv, poetry and conda docs

https://github.com/KemingHe/python-dependency-manager-companion-mcp-server
1•keminghe•1m ago•0 comments

I Gave Every iPhone USB-C [video]

https://www.youtube.com/watch?v=KUXQzVD1TdI
1•pcdoodle•2m ago•1 comments

Uber will let women drivers and riders request to avoid being paired with men

https://www.cnbc.com/2025/07/23/uber-women-drivers-riders.html
2•ortusdux•3m ago•0 comments

Show HN: Bskysrch – An Advanced Search for Bluesky

https://www.bskysrch.com/
1•nklswbr•4m ago•0 comments

California Forever changes its plans from a startup city to a startup Foundry

https://gizmodo.com/tech-billionaires-wanted-to-build-a-new-california-city-theyre-settling-for-an-industrial-park-instead-2000632618
2•simonebrunozzi•5m ago•0 comments

Simulate Harsh User Review for Claude code

https://github.com/wtfsayo/user-review-mcp
3•tough•5m ago•1 comments

Show HN: Agilepitch – The Superhuman for CRMs

https://www.agilepitch.io/
1•james_kelly•5m ago•0 comments

All Being Dragged into a Giant Invisible Structure, Scientists Say

https://www.popularmechanics.com/space/deep-space/a65470864/shapley-supercluster/
1•Bluestein•7m ago•0 comments

Building Telemetry Pipelines with the OpenTelemetry Collector

https://www.dash0.com/guides/opentelemetry-collector
1•ayoisaiah•7m ago•0 comments

Texas Instruments AI Productivity Roundtable (1987) [video]

https://www.youtube.com/watch?v=44XhggAIoJ8
1•romaniv•7m ago•0 comments

Grok CLI – Open-source AI agent that brings the power of Grok in your terminal

https://github.com/superagent-ai/grok-cli
2•homanp•9m ago•1 comments

Furusato Johositu

https://www.osumi.or.jp/sakata/index.html
1•tokinonagare•9m ago•0 comments

BrainScaleS: A Wafer-Scale, Neuromorphic System [pdf]

https://arxiv.org/abs/2303.12359
1•nickpsecurity•10m ago•0 comments

Seaweed powder lowers concrete's carbon emissions without sacrificing strength

https://phys.org/news/2025-07-seaweed-powder-cement-lowers-concrete.html
1•PaulHoule•11m ago•0 comments

New PS5 beta previews DualSense controller pairing across multiple devices

https://blog.playstation.com/2025/07/23/new-ps5-system-update-beta-previews-dualsense-wireless-controller-pairing-across-multiple-devices/
1•robin_reala•12m ago•0 comments

Lumo: Privacy-first AI assistant from Proton, based in Europe

https://lumo.proton.me/
1•jrcplus•12m ago•0 comments

Remove All AI Features from Firefox

2•nabla9•12m ago•0 comments

Battery-powered Starlink Mini is here

https://www.theverge.com/reviews/712043/peakdo-linkpower-review-battery-powered-starlink-mini
2•bookofjoe•13m ago•0 comments

A new kidney – free of daily meds

https://news.wisc.edu/a-new-kidney-free-of-daily-meds/
1•tart-lemonade•13m ago•1 comments

Show HN: I built a form filler that generates realistic test data

https://fillr.app/
1•amineinai•15m ago•0 comments

The End Is Nigh, for the Beta Days for Reticulum

https://unsigned.io/articles/2025_05_09_The_End_Is_Nigh_For_The_Beta_Days.html
2•mofosyne•16m ago•0 comments

Voyage-context-3: focused chunk-level details with global document context

https://blog.voyageai.com/2025/07/23/voyage-context-3/
1•emschwartz•16m ago•0 comments

Show HN: Gitpatch – Send Patches in Seconds

https://gitpatch.com
1•shayief•17m ago•0 comments

NP-Complete Structure – Tabular Problem Classifier

https://github.com/AlgorithmeAi/Credit-Risk-Dataset-BlackSwan-vs-Random-Forest-vs-Algorithme
2•notjustcharles•17m ago•1 comments

Turn rough sketches into works of art directly in Figma

https://www.figma.com/community/plugin/1520509093521345253/roughly
1•dylandeheer•18m ago•0 comments

America's AI Action Plan [pdf]

https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf
4•randomwalker•20m ago•0 comments

Pogocache 1.0 – Claims Better Performance Than Memcache, Valkey and Redis

https://www.phoronix.com/news/Pogocache-1.0-Released
2•bricss•20m ago•0 comments

Solving the cyber talent gap: Three lessons from Ireland

https://www.fastcompany.com/91367754/solving-the-cyber-talent-gap-three-lessons-from-ireland
1•rbanffy•23m ago•0 comments

Twelve Basic Principles of Animation

https://en.wikipedia.org/wiki/Twelve_basic_principles_of_animation
4•Bluestein•23m ago•0 comments

Can Generative AI coding make Emacs and Org-Mode worth it now?

1•jrm4•23m ago•0 comments