frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Omarchy First Impressions

https://brianlovin.com/writing/omarchy-first-impressions-CEEstJk
1•tosh•4m ago•0 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
1•onurkanbkrc•5m ago•0 comments

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

https://github.com/Concode0/Versor
1•concode0•5m ago•1 comments

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

https://medresearch-ai.org/hypotheses-hub/
1•panossk•9m ago•0 comments

Big Tech vs. OpenClaw

https://www.jakequist.com/thoughts/big-tech-vs-openclaw/
1•headalgorithm•11m ago•0 comments

Anofox Forecast

https://anofox.com/docs/forecast/
1•marklit•11m ago•0 comments

Ask HN: How do you figure out where data lives across 100 microservices?

1•doodledood•11m ago•0 comments

Motus: A Unified Latent Action World Model

https://arxiv.org/abs/2512.13030
1•mnming•12m ago•0 comments

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

https://www.thedailybeast.com/obsessed/rotten-tomatoes-desperately-claims-impossible-rating-for-m...
3•juujian•13m ago•1 comments

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

https://www.science.org/doi/10.1126/scisignal.adv0660
1•thunderbong•15m ago•0 comments

Los Alamos Primer

https://blog.szczepan.org/blog/los-alamos-primer/
1•alkyon•17m ago•0 comments

NewASM Virtual Machine

https://github.com/bracesoftware/newasm
1•DEntisT_•20m ago•0 comments

Terminal-Bench 2.0 Leaderboard

https://www.tbench.ai/leaderboard/terminal-bench/2.0
2•tosh•20m ago•0 comments

I vibe coded a BBS bank with a real working ledger

https://mini-ledger.exe.xyz/
1•simonvc•20m ago•1 comments

The Path to Mojo 1.0

https://www.modular.com/blog/the-path-to-mojo-1-0
1•tosh•23m ago•0 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
5•sakanakana00•26m ago•0 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•29m ago•0 comments

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

https://codethoughts.io/posts/2026-02-07-rust-hot-reloading/
3•Tehnix•29m ago•1 comments

Skim – vibe review your PRs

https://github.com/Haizzz/skim
2•haizzz•31m ago•1 comments

Show HN: Open-source AI assistant for interview reasoning

https://github.com/evinjohnn/natively-cluely-ai-assistant
4•Nive11•31m ago•6 comments

Tech Edge: A Living Playbook for America's Technology Long Game

https://csis-website-prod.s3.amazonaws.com/s3fs-public/2026-01/260120_EST_Tech_Edge_0.pdf?Version...
2•hunglee2•35m ago•0 comments

Golden Cross vs. Death Cross: Crypto Trading Guide

https://chartscout.io/golden-cross-vs-death-cross-crypto-trading-guide
3•chartscout•37m ago•0 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
3•AlexeyBrin•40m ago•0 comments

What the longevity experts don't tell you

https://machielreyneke.com/blog/longevity-lessons/
2•machielrey•41m ago•1 comments

Monzo wrongly denied refunds to fraud and scam victims

https://www.theguardian.com/money/2026/feb/07/monzo-natwest-hsbc-refunds-fraud-scam-fos-ombudsman
3•tablets•46m ago•1 comments

They were drawn to Korea with dreams of K-pop stardom – but then let down

https://www.bbc.com/news/articles/cvgnq9rwyqno
2•breve•48m ago•0 comments

Show HN: AI-Powered Merchant Intelligence

https://nodee.co
1•jjkirsch•51m ago•0 comments

Bash parallel tasks and error handling

https://github.com/themattrix/bash-concurrent
2•pastage•51m ago•0 comments

Let's compile Quake like it's 1997

https://fabiensanglard.net/compile_like_1997/index.html
2•billiob•52m ago•0 comments

Reverse Engineering Medium.com's Editor: How Copy, Paste, and Images Work

https://app.writtte.com/read/gP0H6W5
2•birdculture•57m ago•0 comments
Open in hackernews

A Gentle Introduction to CUDA PTX

https://philipfabianek.com/posts/cuda-ptx-introduction/
59•ashvardanian•4mo ago

Comments

the_panopticon•4mo ago
Very interesting. It sounds like tuning at the PTX level can increase workload efficiencies, such as quote "Specifically, we employ customized PTX (Parallel Thread Execution) instructions" from the DeepSeek folks https://arxiv.org/abs/2412.19437.
shetaye•4mo ago
Agreed! The gulf between pure-C++ CUDA and PTX is getting larger with these optimizations. My understanding is that Deepseek used PTX instructions that either had no corresponding C++ implemented (like `wgmma` mentioned in the article) or uncommon permutations of modifiers (`LD.Global.NC.L1::no_allocate.L2::256b`).
saagarjha•4mo ago
They didn’t employ custom PTX instructions; they used existing ones in ways they were not designed to be used.
the__alchemist•4mo ago
Is this analogy valid: Writing PTX is like writing assembly instead of a higher-level language (C, C++, rust etc) for CPU code? E.g. normally the higher level code compiles to it, but you can do optimizations by going lower?

For context, like the opening paragraph in the article goes into, I generate PTX code regularly, but have no idea what the actual code in the PTX file means!

I'm curious about the forward compatibility the article goes into. I only experience that to a point: Code compiled on Cuda 12 does not seem to work on machines with Cuda 13.

philipfabianek•4mo ago
Indeed, this is one way to think about it. However, PTX is an instruction set for a virtual machine, not the actual hardware. The true, hardware-specific assembly is called SASS (Streaming Assembly) and the PTX code is translated into SASS by the GPU driver (using ptxas) in a final compilation step. Unlike SASS, PTX is (mostly) forward compatible.

I don't know the details about your CUDA 12 vs. 13 issue but I think it is not about hardware compatibility but more about the software stack. An application linked against CUDA 12 libraries and might not work with CUDA 13 libraries.

neuroelectron•4mo ago
That's not much different than a modern CPU with an OS on top; where you have the OS doing some of the scheduling then the CPU is splitting up the instructions into microinstructions and then scheduling them again in finer detail (hyperthreading and such). Seems to me there must be a C-level syntax and compiler so you're not manually splitting up individual adds and such and is still capable of optimizing the math effectively. But if that were true, we wouldn't have AAA game studios going to NVidia to optimize their game engines for each individual game.
checker659•4mo ago
https://en.wikipedia.org/wiki/Intermediate_representation
saagarjha•4mo ago
It’s really not true anymore that PTX is forward compatible. There’s a subset that is but any of the new interesting interfaces that have been added are not forward compatible and change in each microarchitectural revision. Most of the reason you’d drop down to PTX anyway is to use those; otherwise compilers are fairly good these days and it’s rarely the case you’ll see PTX unless you’re profiling.