frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

https://github.com/ClickHouse/postgres-clickhouse-stack
1•saisrirampur•17s ago•0 comments

Game Boy Advance d-pad capacitor measurements

https://gekkio.fi/blog/2026/game-boy-advance-d-pad-capacitor-measurements/
1•todsacerdoti•31s ago•0 comments

South Korean crypto firm accidentally sends $44B in bitcoins to users

https://www.reuters.com/world/asia-pacific/crypto-firm-accidentally-sends-44-billion-bitcoins-use...
1•layer8•1m ago•0 comments

Apache Poison Fountain

https://gist.github.com/jwakely/a511a5cab5eb36d088ecd1659fcee1d5
1•atomic128•3m ago•0 comments

Web.whatsapp.com appears to be having issues syncing and sending messages

http://web.whatsapp.com
1•sabujp•3m ago•1 comments

Google in Your Terminal

https://gogcli.sh/
1•johlo•5m ago•0 comments

Shannon: Claude Code for Pen Testing

https://github.com/KeygraphHQ/shannon
1•hendler•5m ago•0 comments

Anthropic: Latest Claude model finds more than 500 vulnerabilities

https://www.scworld.com/news/anthropic-latest-claude-model-finds-more-than-500-vulnerabilities
1•Bender•9m ago•0 comments

Brooklyn cemetery plans human composting option, stirring interest and debate

https://www.cbsnews.com/newyork/news/brooklyn-green-wood-cemetery-human-composting/
1•geox•9m ago•0 comments

Why the 'Strivers' Are Right

https://greyenlightenment.com/2026/02/03/the-strivers-were-right-all-along/
1•paulpauper•11m ago•0 comments

Brain Dumps as a Literary Form

https://davegriffith.substack.com/p/brain-dumps-as-a-literary-form
1•gmays•11m ago•0 comments

Agentic Coding and the Problem of Oracles

https://epkconsulting.substack.com/p/agentic-coding-and-the-problem-of
1•qingsworkshop•12m ago•0 comments

Malicious packages for dYdX cryptocurrency exchange empties user wallets

https://arstechnica.com/security/2026/02/malicious-packages-for-dydx-cryptocurrency-exchange-empt...
1•Bender•12m ago•0 comments

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent
1•shubham-coder•13m ago•0 comments

Penisgate erupts at Olympics; scandal exposes risks of bulking your bulge

https://arstechnica.com/health/2026/02/penisgate-erupts-at-olympics-scandal-exposes-risks-of-bulk...
4•Bender•13m ago•0 comments

Arcan Explained: A browser for different webs

https://arcan-fe.com/2026/01/26/arcan-explained-a-browser-for-different-webs/
1•fanf2•15m ago•0 comments

What did we learn from the AI Village in 2025?

https://theaidigest.org/village/blog/what-we-learned-2025
1•mrkO99•15m ago•0 comments

An open replacement for the IBM 3174 Establishment Controller

https://github.com/lowobservable/oec
1•bri3d•18m ago•0 comments

The P in PGP isn't for pain: encrypting emails in the browser

https://ckardaris.github.io/blog/2026/02/07/encrypted-email.html
2•ckardaris•20m ago•0 comments

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

https://github.com/fokdelafons/lustra
1•fokdelafons•20m ago•1 comments

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

1•Chance-Device•22m ago•0 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
1•ColinWright•24m ago•0 comments

Jim Fan calls pixels the ultimate motor controller

https://robotsandstartups.substack.com/p/humanoids-platform-urdf-kitchen-nvidias
1•robotlaunch•28m ago•0 comments

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
1•HotGarbage•28m ago•0 comments

AI UX Playground: Real-world examples of AI interaction design

https://www.aiuxplayground.com/
1•javiercr•29m ago•0 comments

The Field Guide to Design Futures

https://designfutures.guide/
1•andyjohnson0•29m ago•0 comments

The Other Leverage in Software and AI

https://tomtunguz.com/the-other-leverage-in-software-and-ai/
1•gmays•31m ago•0 comments

AUR malware scanner written in Rust

https://github.com/Sohimaster/traur
3•sohimaster•33m ago•1 comments

Free FFmpeg API [video]

https://www.youtube.com/watch?v=6RAuSVa4MLI
3•harshalone•34m ago•1 comments

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
2•PaulHoule•38m ago•0 comments
Open in hackernews

Show HN: I built a tensor library from scratch in C++/CUDA

https://github.com/nirw4nna/dsc
119•nirw4nna•7mo ago
Hi HN,

Over the past few months, I've been building `dsc`, a tensor library from scratch in C++/CUDA. My main focus has been on getting the basics right, prioritizing a clean API, simplicity, and clear observability for running small LLMs locally.

The key features are: - C++ core with CUDA support written from scratch. - A familiar, PyTorch-like Python API. - Runs real models: it's complete enough to load a model like Qwen from HuggingFace and run inference on both CUDA and CPU with a single line change[1]. - Simple, built-in observability for both Python and C++.

Next on the roadmap is adding BF16 support and then I'll be working on visualization for GPU workloads.

The project is still early and I would be incredibly grateful for any feedback, code reviews, or questions from the HN community!

GitHub Repo: https://github.com/nirw4nna/dsc

[1]: https://github.com/nirw4nna/dsc/blob/main/examples/models/qw...

Comments

helltone•7mo ago
This is very cool. I'm wondering if some of the templates and switch statements would be nicer if there was an intermediate representation and a compiler-like architecture.

I'm also curious about how this compares to something like Jax.

Also curious about how this compares to zml.

nirw4nna•7mo ago
You are absolutely correct! I started working on a sort of compiler a while back but decided to get the basics down first. The templates and switch(s) are not really the issue but rather going back and forth between C & Python. This is an experiment I did a few months ago: https://x.com/nirw4nna/status/1904114563672354822 as you can see there is a ~20% perf gain just by generating a naive C++ kernel instead of calling 5 separate kernels in the case of softmax.
kajecounterhack•7mo ago
Cool stuff! Is the goal of this project personal learning, inference performance, or something else?

Would be nice to see how inference speed stacks up against say llama.cpp

liuliu•7mo ago
Both uses cublas under the hood. So I think it is similar for prefilling (of course, this framework is too early and don't have FP16 / BF16 support for GEMM it seems). Hand-roll gemv is faster for token generation hence llama.cpp is better.
kajecounterhack•7mo ago
Unrelated: my man, I loved your C vision library back in the day.
nirw4nna•7mo ago
Thanks! To be honest, it started purely as a learning project. I was really inspired when llama.cpp first came out and tried to build something similar in pure C++ (https://github.com/nirw4nna/YAMI), mostly for fun and to practice low-level coding. The idea for DSC came when I realized how hard it was to port new models to that C++ engine, especially since I don't have a deep ML background. I wanted something that felt more like PyTorch, where I could experiment with new architectures easily. As for llama.cpp, it's definitely faster! They have hand-optimizing kernels for a whole bunch of architectures, models and data types. DSC is more of a general-purpose toolkit. I'm excited to work on performance later on, but for now, I'm focused on getting the API and core features right.
NalNezumi•7mo ago
If someone wanted to learn the same thing, what material would you suggest is a good place to start?
nirw4nna•7mo ago
You just need a foundation of C/C++. If you already have that then just start programming, it's way better than reading books/guides/blogs (at least until you're stuck!). Also, you can read the source code of other similar projects on GitHub and get ideas from them, this is what I did at the beginning.
aklein•7mo ago
I noticed you interface with the native code via ctypes. I think cffi is generally preferred (eg, https://cffi.readthedocs.io/en/stable/overview.html#api-mode...). Although you'd have more flexibility if you build your own python extension module (eg using pybind), which will free you from a simple/strict ABI. Curious if this strict separation of C & Python was a deliberate design choice.
nirw4nna•7mo ago
Yes, when I designed the API I wanted to keep a clear distinction between Python and C. At some point I had two APIs: 1 in Python and the other in high-level C++ and they both shared the same low-level C API. I find this design quite clean and easy to work with if multiple languages are involved. When I'll get to perf I plan to experiment a bit with nanobind (https://github.com/wjakob/nanobind) and see if there's a noticeable difference wrt ctypes.
almostgotcaught•7mo ago
The call overhead of using ctypes vs nanobind/pybind is enormous

https://news.ycombinator.com/item?id=31378277

Even if the number reported there is off, it's not far off because ctypes just calls out to libffi which is known to be the slowest way to do ffi.

nirw4nna•7mo ago
Thanks for pointing this out! I'll definitely have to investigate other approaches. nanobind looks interesting but I don't need to expose complex C++ objects, I just need the 'fastest' way of calling into a C API. I guess the goto for this is CFFI?
almostgotcaught•7mo ago
It's the same thing - both nanobind and cffi compile the binding. The fact that nanobind let's you expose cpp doesn't prevent you from only exposing c. And IMHO nanobind is better because you don't need to learn another language to use it (ie you don't need to learn cffi's DSL).
rrhjm53270•7mo ago
Do you have any plan for the serialization and deserialization of your tensor and nn library?
nirw4nna•7mo ago
Right now I can load tensors directly from a safetensors file or from a NumPy array so I don't really have in mind to add my own custom format but I do plan to support GGUF files.
amtk2•7mo ago
super n00b question , what kind of labtop do you need to do project like this? Is mac ok? or do you need dedicated linux labtop?
kadushka•7mo ago
Any laptop with an Nvidia card
amtk2•7mo ago
does gaming labtop works with windows? have always used mac for development because tool chain is so much easier, wondering if there is a difference between windows and linux for cuda development
Anerudhan•7mo ago
You can always use WSL
nirw4nna•7mo ago
I developed this on an HP Omen 15 with an i7-8750H, a GTX 1050TI and 32GB or RAM with Linux Mint as my OS.
einpoklum•7mo ago
It's very C-like, heavy use of macros, prefixes instead of namespaces, raw pointers for arrays etc. Technically you're compiling C++, but... not really.

No negative or positive comment on its usability though, I'm not an ML/Neural Network simulation person.

caned•7mo ago
I've found adherence to C++ conventions in low-level software to be a rather contentious issue, mostly recently when working in an ML compiler group. One set abhorred the use of macros, the other any kind of polymorphism or modern C++ feature.

Coming from a background of working with OS kernels and systems software, I don't mind the kind of explicit "C++ lite" style used by the OP. Left to my own devices, I usually write things that way. I would think twice if I was trying to design a large framework, but ... I try to avoid those.

einpoklum•7mo ago
If you think that, I encourage you to check out this presentation:

https://www.youtube.com/watch?v=zBkNBP00wJE

About writing a Commodore C64 game in modern(ish) C++

maybe it will sway you a bit :-)

caned•7mo ago
That is impressive. It certainly shows what is possible _if_ you are familiar enough with the intricacies of modern C++. I'm not sure how I feel about a workflow where one needs to continually address "overhead" introduced by the language environment.
nirw4nna•7mo ago
Yes! This was actually one of my initial goals! I actually like to work in a C-style-C++ let's say where I turn off C++ features I don't need and just use the one I actually need like templates, objects ecc... I find this style to be easy to reason about when it comes to performance.
pjmlp•7mo ago
The proper way to reason about performance is to use a profiler, not second guessing what C like code generates.
revskill•7mo ago
Why not zig.
nirw4nna•7mo ago
Because I happen to know C++ and I just wanted to build something rather than learn a new language. Zig looks very interesting though, there are already other projects in this space that use it with great success (see: https://github.com/zml/zml).