frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
159•isitcontent•7h ago•17 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
262•vecti•9h ago•125 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
210•eljojo•10h ago•135 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
53•phreda4•7h ago•9 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
78•antves•1d ago•59 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
41•nwparker•1d ago•11 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
13•NathanFlurry•15h ago•5 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
147•bsgeraci•1d ago•61 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
3•AGDNoob•3h ago•1 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•6h ago•1 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
13•toborrm9•12h ago•5 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•4h ago•0 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•5h ago•0 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
23•dchu17•11h ago•11 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•6h ago•0 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
171•vkazanov•1d ago•48 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•6h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
10•KevinChasse•12h ago•9 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
8•sawyerjhood•13h ago•0 comments

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

https://github.com/SpOpsi/Project-Baver
2•solarV26•10h ago•0 comments

Show HN: Agentism – Agentic Religion for Clawbots

https://www.agentism.church
2•uncanny_guzus•10h ago•0 comments

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

https://github.com/BansheeTech/Disavow-Generator
5•SurceBeats•16h ago•1 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

https://github.com/choihimchan/bpu-stream-engine
2•octablock•12h ago•0 comments

Show HN: Total Recall – write-gated memory for Claude Code

https://github.com/davegoldblatt/total-recall
10•davegoldblatt•1d ago•6 comments

Show HN: Hibana – An Affine MPST Runtime for Rust

https://hibanaworks.dev
3•o8vm•14h ago•0 comments

Show HN: Beam – Terminal Organizer for macOS

https://getbeam.dev/
2•faalbane•14h ago•2 comments

Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

https://github.com/bethington/ghidra-mcp
294•xerzes•2d ago•66 comments
Open in hackernews

Show HN: Autograd.c – A tiny ML framework built from scratch

https://github.com/sueszli/autograd.c
85•sueszli•1mo ago
built a tiny pytorch clone in c after going through prof. vijay janapa reddi's mlsys book: mlsysbook.ai/tinytorch/

perfect for learning how ml frameworks work under the hood :)

Comments

sueszli•1mo ago
woah, this got way more attention than i expected. thanks a lot.

if you are interested in the technical details, the design specs are here: https://github.com/sueszli/autograd.c/blob/main/docs/design....

if you are working on similar mlsys or compiler-style projects and think there could be overlap, please reach out: https://sueszli.github.io/

spwa4•1mo ago
Cool. But this makes me wonder. This negates most of the advantages of C. Is there a compiler-autograd "library"? Something that would compile into C specifically to execute as fast as possible on CPUs with no indirection at all.
thechao•1mo ago
At best you'd be restricted to the forward mode, which would still double stack pressure. If you needed reverse mode you'd need 2x stack, and the back sweep over the stack based tape would have the nearly perfectly unoptimal "grain". If you allows the higher order operators (both push out and pull back), you're going to end up with Jacobians & Hessians over nontrivial blocks. That's going to need the heap. It's still better than an unbounded loop tape, though.

We had all these issues back in 2006 when my group was implementing autograd for C++ and, later, a computer algebra system called Axiom. We knew it'd be ideal for NN; I was trying to build this out for my brother who was porting AI models to GPUs. (This did not work in 2006 for both HW & math reasons.)

spwa4•1mo ago
Why not recompile every iteration? Weights are only updated at the end of the batch size at the earliest, and for distributed training, n batch sizes at the fastest, and generally only at the end of an iteration. In either case the cost of recompiling would be negligeable, no?
thechao•1mo ago
You'd pay the cost of the core computation O(n) times. Matrix products under the derivative fibration (jet; whatever your algebra calls it) are just more matrix products. A good sized NN is already in the heap. Also, the hard part is finding the ideal combination of fwd vs rev transforms (it's NP hard). This is similar to the complexity of finding the ideal subblock matrix multiply orchestration.

So, the killer cost is at compile time, not runtime, which is fundamental to the underlying autograd operation.

On the flip side, it's 2025, not 2006, so pro modern algorithms & heuristics can change this story quite a bit.

All of this is spelled out in Griewank's work (the book).

spwa4•1mo ago
This one? https://epubs.siam.org/doi/book/10.1137/1.9780898717761
thechao•1mo ago
Yep. You can find used copies at some online places? Powell's in Portland (online store) sometimes has it for 25 or 30 $s.
sueszli•1mo ago
a heap-free implementation could be a really cool direction to explore. thanks!

i think you might be interested in MLIR/IREE: https://github.com/openxla/iree

attractivechaos•1mo ago
> Is there a compiler-autograd "library"?

Do you mean the method theano is using? Anyway, the performance bottleneck often lies in matrix multiplication or 2D-CNN (which can be reduced to matmul). Compiler autograd wouldn't save much time.

marcthe12•1mo ago
We would need to mirror jax architecture more. Since the jax is sort of jit arch wise. Basically you somehow need a good way to convert computational graph to machine code while at compile time also perform a set of operations on the graph.
justinnk•1mo ago
I believe Enzyme comes close to what you describe. It works on the LLVM IR level.

https://enzyme.mit.edu

PartiallyTyped•1mo ago
Any reason for creating a new tensor when accumulating grads over updating the existing one?

Edit: I asked this before I read the design decisions. Reasoning is, as far as I understand, that for simplificity no in-place operations hence accumulating it done on a new tensor.

sueszli•1mo ago
yeah, exactly. it's for explicit ownership transfer. you always own what you receive, sum it, release both inputs, done. no mutation tracking, no aliasing concerns.

https://github.com/sueszli/autograd.c/blob/main/src/autograd...

i wonder whether there is a more clever way to do this without sacrificing simplicity.