frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•3m ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
1•throwaw12•5m ago•0 comments

MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•5m ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•6m ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•8m ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•11m ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
1•andreabat•13m ago•0 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
1•mgh2•20m ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•21m ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•27m ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•28m ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•28m ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•31m ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•33m ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
2•birdculture•34m ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•36m ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
2•ramenbytes•39m ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•40m ago•0 comments

Ed Zitron: The Hater's Guide to Microsoft

https://bsky.app/profile/edzitron.com/post/3me7ibeym2c2n
2•vintagedave•43m ago•1 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
1•__natty__•43m ago•0 comments

Show HN: Android-based audio player for seniors – Homer Audio Player

https://homeraudioplayer.app
3•cinusek•44m ago•1 comments

Starter Template for Ory Kratos

https://github.com/Samuelk0nrad/docker-ory
1•samuel_0xK•45m ago•0 comments

LLMs are powerful, but enterprises are deterministic by nature

2•prateekdalal•49m ago•0 comments

Make your iPad 3 a touchscreen for your computer

https://github.com/lemonjesus/ipad-touch-screen
2•0y•54m ago•1 comments

Internationalization and Localization in the Age of Agents

https://myblog.ru/internationalization-and-localization-in-the-age-of-agents
1•xenator•54m ago•0 comments

Building a Custom Clawdbot Workflow to Automate Website Creation

https://seedance2api.org/
1•pekingzcc•57m ago•1 comments

Why the "Taiwan Dome" won't survive a Chinese attack

https://www.lowyinstitute.org/the-interpreter/why-taiwan-dome-won-t-survive-chinese-attack
2•ryan_j_naughton•57m ago•0 comments

Xkcd: Game AIs

https://xkcd.com/1002/
2•ravenical•59m ago•0 comments

Windows 11 is finally killing off legacy printer drivers in 2026

https://www.windowscentral.com/microsoft/windows-11/windows-11-finally-pulls-the-plug-on-legacy-p...
2•ValdikSS•59m ago•0 comments

From Offloading to Engagement (Study on Generative AI)

https://www.mdpi.com/2306-5729/10/11/172
1•boshomi•1h ago•1 comments
Open in hackernews

Why CUDA translation wont unlock AMD

https://eliovp.com/why-cuda-translation-wont-unlock-amds-real-potential/
88•JonChesterfield•2mo ago

Comments

pixelpoet•2mo ago
Actual article title says "won't"; wont is a word meaning habit or proclivity.
InvisGhost•2mo ago
In situations like this, I try to focus on whether the other person understood what was being communicated rather than splitting hairs. In this case, I don't think anyone would be confused.
philipallstar•2mo ago
Probably best to just fix the spelling.
Eliovp•2mo ago
That's what you get when you don't use AI to write an article :p
outside1234•2mo ago
Are the hyperscalers really using CUDA? This is what really matters. We know Google isn't. Are AWS and Azure for their hosting of OpenAI models et al?
bigyabai•2mo ago
> We know Google isn't.

Google isn't internally, so far as we know. Google's hyperscaler products have long offered CUDA options, since the demand isn't limited to AI/tensor applications that cannibalize TPU's value prop: https://cloud.google.com/nvidia

wmf•2mo ago
All Nvidia GPUs, which are probably >70% of the market, use CUDA.
lvl155•2mo ago
Let’s just say what it is: devs are too constrained to jump ship right now. It’s a massive land grab and you are not going to spend time tinkering with CUDA alternatives when even a six-month delay can basically kill your company/organization. Google and Apple are two companies with enough resources to do it. Google isn’t because they’re keeping it proprietary to their cloud. Apple still have their heads stuck in sand barely capable of fixing Siri.
stingraycharles•2mo ago
Google has their own TPUs so they don’t have any vendor lock-in issues at all.

OpenAI OTOH is big enough that the vendor lock-in is actually hurting them, and them making that massive deal with AMD may finally push the needle for AMD and improve things in the ecosystem to make AMD a smooth experience.

apfsx•2mo ago
Google’s TPU’s are not powering Gemini or whatever X equivalent LLM you want to compare to.
rdudek•2mo ago
What is powering Gemini?
jsolson•2mo ago
TPUs: https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...
rfw300•2mo ago
This isn't true. Gemini is trained and run almost entirely on TPUs. Anthropic also uses TPUs for inference, see, e.g., https://www.anthropic.com/news/expanding-our-use-of-google-c... and https://www.anthropic.com/engineering/a-postmortem-of-three-.... OpenAI also uses TPUs for inference at least in some measure: https://x.com/amir/status/1938692182787137738?t=9QNb0hfaQShW....
skirmish•2mo ago
I can assure you that most internal ML teams are using TPUs both for training and inference, they are just so much easier to get. Whatever GPUs exist are either reserved for Google Cloud customers, or loaned temporarily to researchers who want to publish easily externally reproducible results.
stingraycharles•2mo ago
They are, even Apple famously uses Google Cloud for their cloud based AI stuff solely because of Apple not wanting to buy NVidia.

Google Cloud does have a lot of NVidia, but that’s for their regular cloud customers, not internal stuff.

jsolson•2mo ago
This comment is incorrect: https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...
throwaway31131•2mo ago
Having your own ASIC comes with a huge sunk cost. One gets advantages from that, but it's still a lock, just a lock of a different color. But with Google money and manpower, management can probably pursue both paths in parallel and not care.
kj4ips•2mo ago
I agree pretty strongly. A translation layer like this is making an intentional trade: Giving up performance and HW alignment for less lead time and effort to make a proper port.
martinald•2mo ago
Perhaps I'm misunderstanding the market dynamics; but isn't AMDs real opp inference over research?

Training etc still happens on NVDA but inference is somewhat easy to do on vLLM et al with a true ROCm backend with little effort?

mandevil•2mo ago
Yeah, ROCm focused code will always beat generic code compiled down. But this is a really difficult game to win.

For example, Deepseek R-1 released optimized for running on Nvidia HW, and needed some adaption to run as well on ROCm. This was for the exact same reasons that ROCm code will beat generic code compiled into ROCm, in the same way. Basically the Deepseek team, for their own purposes, created R-1 to fit Nvidia's way of doing things (because Nvidia is market-dominant) on their own. Once they released, someone like Elio or AMD would have to do the work of adapting the code to run best on ROCm.

For more established players who weren't out-of-left-field surprises like Deepseek, e.g. Meta's Llama series, mostly coordinate with AMD ahead of release day, but I suspect that AMD still has to pay for the engineering work themselves while Meta does the work to make it run on Nvidia themselves. This simple fact, that every researcher makes their stuff work on CUDA themselves, but AMD or someone like Elio has to do the work to move it over to get it to be as performant on ROCm, that is what keeps people in the CUDA universe.

latchkey•2mo ago
Kimi is the latest model that isn't running correctly on AMD. Apparently close to Deepseek in design, but different enough that it just doesn't work.

It isn't just the model, it is the engine to run it. From what I understand this model works with sglang, but not with vLLM.

suprjami•2mo ago
This is normal. An inference engine needs support for a model's particular implementation of the transformer architecture. This has been true for almost every model release since we got local weights.

Really good model providers send a launch-day patch to llama.cpp and vllm to make sure people can run their model instantly.

latchkey•2mo ago
It isn't about normal or not. It is that those patches are done for Nvidia, but not AMD. It is that it takes time and energy to vet them and merge them into those projects. Kimi has been out for 3 months now and it still doesn't run out of the box on vLLM on AMD, but it works just fine with Nvidia.
doctorpangloss•2mo ago
All they have to do is release air cooled 96GB GDDR7 PCIe5 boards with 4x Infinity Link, and charge $1,900 for it.
jmward01•2mo ago
Right now we need diversity in the ecosystem. AMD is finally getting mature and hopefully that will lead to them truly getting a second, strong, opinion into ecosystem. The friction this article talks about is needed to push new ideas.
manjose2018•2mo ago
https://geohot.github.io//blog/jekyll/update/2025/03/08/AMD-...

https://tinygrad.org/ is the only viable alternative to CUDA that I have seen popup in the past few years.

bigyabai•2mo ago
Viable how? "Feasible" might be a better word here, I haven't heard many (any?) war-stories about a TinyBox in production but maybe I'm OOTL.
erichocean•2mo ago
Both Mojo and ThunderKittens/HipKittens are viable on AMD.
balaclava9•2mo ago
Mojo runs faster on nvidia hardware than CUDA in some cases.

https://x.com/clattner_llvm/status/1982196673771139466?s=61

djsjajah•2mo ago
I can't tell if you are making a joke or not.

They are not even remotely equivalent. tinygrad is a toy.

If you are serious, I would be interested to hear how you see tinygrad replacing CUDA. I could see a tiny grad zealot arguing that it is gong to replace torch, but CUDA??

Have you looked into AMD support in torch? I would wager that like for like, a torch/amd implementation of a models is going to run rings around a tinygrad/amd implementation.

fulafel•2mo ago
Vulkan Compute is catching up with HIP (or whatever the compatibility stuff is called now), which seems like a welcome break from CUDA - in this benchmark it beats CUDA in some benchmarks on AMD: https://www.phoronix.com/review/rocm-71-llama-cpp-vulkan
pjmlp•2mo ago
For most devs using GLSL instead of C++20, or Python GPU JIT, is a downgrade in developer experience.
fulafel•2mo ago
For Python: PyTorch has Vulkan support according to https://docs.pytorch.org/executorch/stable/backends/vulkan/v... - wonder how performance is there.
pjmlp•2mo ago
CUDA is not only for AI.
apitman•2mo ago
Our open source library is currently hard locked into CUDA due to nvCOMP for gzip decompression (bioinformatics files). What I wouldn't give for an open source implementation, especially if it targeted WebGPU.
latchkey•2mo ago
A bit of background. This is directed towards Spectral Compute (Michael) and https://scale-lang.com/. I know both of these guys personally and consider them both good friends, so you have to understand a bit of the background in order to really dive into this.

My take on it is fairly well summed up at the bottom of Elio's post. In essence, Elio is taking the view of "we would never use scale-lang for llms because we have a product that is native AMD" and Michael is taking the view of "there is a ton of CUDA code out there that isn't just AI and we can help move those people over to AMD... oh and by the way, we actually do know what we are doing, and we think we have a good chance at making this perform."

At the end of the day, both companies (my friends) are trying to make AMD a viable solution in a world dominated by an ever growing monopoly. Stepping back a bit and looking at the larger picture, I feel this is fantastic and want to support both of them in their efforts.

Eliovp•2mo ago
Just to clarify: this post was not written against Spectral Compute. Their recent investment news was the trigger for us to finally write it yes, but the idea has been on our minds for a long time.

We actually think solutions like theirs are good for the ecosystem, they make it easier for people to at least try AMD without throwing away their CUDA code.

Our point is simply this: if you want top-end performance (big LLMs, specific floating point support, serious throughput/latency), translation alone is not enough. At that point you have to focus on hardware-specific tuning: CDNA kernel shapes, MFMA GEMMs, ROCm-specific attention/TP, KV-cache, etc.

That’s the layer we work on: we don’t replace people’s engines, we just push the AMD hardware as hard as it can go.

jfalcon•2mo ago
CUDA isn't all that and a bag of chips. It just is the Facebook/Twitter of the data science and from that LLM space. There are Tensor processors and other ASIC processing for specific compute functions that can give Nvidia a challenge but it's not unknown to every gamer that there has always been a performance difference between Nvidia and AMD/ATI.

Ok, point made Nvidia. Kudos.

ATI had their moment in the sun before ASIC ate their cryptocurrency lunch. So both still had/have relevance outside gaming. But, I see Intel is starting to take GPU space seriously and they shouldn't be ruled out.

And as mentioned elsewhere in the comments, there is Vulkan. There is also this idea of virtualized GPU as now the bottleneck isn't CPU... it's now GPU. As I mentioned there are Tensors, Moore's Law thresholds coming back again with 1 nanometer manufacturing, there is going to be a point where we hit a threshold again with current chips and we will have a change in technology - again.

So while Nvidia is living the life - unless they have a crystal ball of how tensors are going to go that they can move CUDA towards, there is going to be a "co-processor" future coming up and with that the next step towards NPUs will be taken. This is where Apple is aligning itself because, after all, they had the money and just said "Nope, we'll license this round out..."

AMD isn't out yet. They, along with Intel and others, just need to figure out where the next bottlenecks are and build those toll bridges.

musicale•2mo ago
If you can run PyTorch well, isn't that good enough for a lot of people?