frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: I built a tensor library from scratch in C++/CUDA

https://github.com/nirw4nna/dsc
70•nirw4nna•5h ago
Hi HN,

Over the past few months, I've been building `dsc`, a tensor library from scratch in C++/CUDA. My main focus has been on getting the basics right, prioritizing a clean API, simplicity, and clear observability for running small LLMs locally.

The key features are: - C++ core with CUDA support written from scratch. - A familiar, PyTorch-like Python API. - Runs real models: it's complete enough to load a model like Qwen from HuggingFace and run inference on both CUDA and CPU with a single line change[1]. - Simple, built-in observability for both Python and C++.

Next on the roadmap is adding BF16 support and then I'll be working on visualization for GPU workloads.

The project is still early and I would be incredibly grateful for any feedback, code reviews, or questions from the HN community!

GitHub Repo: https://github.com/nirw4nna/dsc

[1]: https://github.com/nirw4nna/dsc/blob/main/examples/models/qw...

Comments

helltone•4h ago
This is very cool. I'm wondering if some of the templates and switch statements would be nicer if there was an intermediate representation and a compiler-like architecture.

I'm also curious about how this compares to something like Jax.

Also curious about how this compares to zml.

kajecounterhack•3h ago
Cool stuff! Is the goal of this project personal learning, inference performance, or something else?

Would be nice to see how inference speed stacks up against say llama.cpp

liuliu•2h ago
Both uses cublas under the hood. So I think it is similar for prefilling (of course, this framework is too early and don't have FP16 / BF16 support for GEMM it seems). Hand-roll gemv is faster for token generation hence llama.cpp is better.
aklein•2h ago
I noticed you interface with the native code via ctypes. I think cffi is generally preferred (eg, https://cffi.readthedocs.io/en/stable/overview.html#api-mode...). Although you'd have more flexibility if you build your own python extension module (eg using pybind), which will free you from a simple/strict ABI. Curious if this strict separation of C & Python was a deliberate design choice.
rrhjm53270•1h ago
Do you have any plan for the serialization and deserialization of your tensor and nn library?
amtk2•32m ago
super n00b question , what kind of labtop do you need to do project like this? Is mac ok? or do you need dedicated linux labtop?
kadushka•12m ago
Any laptop with an Nvidia card

Andrej Karpathy's YC AI SUS talk on the future of the industry

https://www.donnamagi.com/articles/karpathy-yc-talk
168•pudiklubi•3h ago•76 comments

The Unreasonable Effectiveness of Fuzzing for Porting Programs

https://rjp.io/blog/2025-06-17-unreasonable-effectiveness-of-fuzzing
108•Bogdanp•4h ago•13 comments

Show HN: Workout.cool – Open-source fitness coaching platform

https://github.com/Snouzy/workout-cool
440•surgomat•8h ago•151 comments

My iPhone 8 Refuses to Die: Now It's a Solar-Powered Vision OCR Server

https://terminalbytes.com/iphone-8-solar-powered-vision-ocr-server/
84•hemant6488•4h ago•25 comments

Writing documentation for AI: best practices

https://docs.kapa.ai/improving/writing-best-practices
85•mooreds•4h ago•23 comments

Show HN: I built a tensor library from scratch in C++/CUDA

https://github.com/nirw4nna/dsc
70•nirw4nna•5h ago•7 comments

Homomorphically Encrypting CRDTs

https://jakelazaroff.com/words/homomorphically-encrypted-crdts/
161•jakelazaroff•7h ago•51 comments

Poline – An enigmatic color palette generator using polar coordinates

https://meodai.github.io/poline/
149•zdw•3d ago•34 comments

Yes I Will Read Ulysses Yes

https://www.theatlantic.com/magazine/archive/2025/07/zachary-leader-richard-ellmann-james-joyce-review/682907/
39•petethomas•3h ago•33 comments

Terpstra Keyboard

http://terpstrakeyboard.com/web-app/keys.htm
185•xeonmc•10h ago•65 comments

Introduction to the A* Algorithm

https://www.redblobgames.com/pathfinding/a-star/introduction.html
199•auraham•1d ago•73 comments

MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model

https://github.com/MiniMax-AI/MiniMax-M1
292•danboarder•13h ago•67 comments

Is There a Half-Life for the Success Rates of AI Agents?

https://www.tobyord.com/writing/half-life
162•EvgeniyZh•9h ago•88 comments

Attimet (YC F24) – Quant Trading Research Lab – Is Hiring Founding Engineer

https://www.ycombinator.com/companies/attimet/jobs/b1w9pjE-founding-engineer
1•kbanothu•3h ago

Framework Laptop 12 review

https://arstechnica.com/gadgets/2025/06/framework-laptop-12-review-im-excited-to-see-what-the-2nd-generation-looks-like/
159•moelf•5h ago•197 comments

Scrappy - make little apps for you and your friends

https://pontus.granstrom.me/scrappy/
387•8organicbits•15h ago•125 comments

Revisiting Minsky's Society of Mind in 2025

https://suthakamal.substack.com/p/revisiting-minskys-society-of-mind
38•suthakamal•5h ago•11 comments

Locally hosting an internet-connected server

https://mjg59.dreamwidth.org/72095.html
122•pabs3•15h ago•119 comments

I counted all of the yurts in Mongolia using machine learning

https://monroeclinton.com/counting-all-yurts-in-mongolia/
193•furkansahin•12h ago•71 comments

Show HN: Trieve CLI – Terminal-based LLM agent loop with search tool for PDFs

https://github.com/devflowinc/trieve/tree/main/clients/cli
16•skeptrune•6h ago•7 comments

After millions of years, why are carnivorous plants still so small?

https://www.smithsonianmag.com/articles/carnivorous-plants-have-been-trapping-animals-for-millions-of-years-so-why-have-they-never-grown-larger-180986708/
177•gmays•5d ago•77 comments

Building agents using streaming SQL queries

https://www.morling.dev/blog/this-ai-agent-should-have-been-sql-query/
80•rmoff•5h ago•7 comments

Should we design for iffy internet?

https://bytes.zone/posts/should-we-design-for-iffy-internet/
44•surprisetalk•2d ago•24 comments

A different take on S-expressions

https://gist.github.com/tearflake/569db7fdc8b363b7d320ebfeef8ab503
29•tearflake•3d ago•18 comments

Spatializing 6k years of global urbanization from 3700 BC to AD 2000

https://www.nature.com/articles/sdata201634
20•talonx•3d ago•1 comments

The Grug Brained Developer (2022)

https://grugbrain.dev/
983•smartmic•1d ago•481 comments

Real-time action chunking with large models

https://www.pi.website/research/real_time_chunking
54•pr337h4m•1d ago•7 comments

Spherical CNNs (2018)

https://arxiv.org/abs/1801.10130
9•rkp8000•2d ago•1 comments

Reasoning by Superposition: A Perspective on Chain of Continuous Thought

https://arxiv.org/abs/2505.12514
43•danielmorozoff•8h ago•1 comments

Show HN: Free local security checks for AI coding in VSCode, Cursor and Windsurf

21•jaimefjorge•8h ago•11 comments