frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I have written gemma3 inference in pure C

https://github.com/robitec97/gemma3.c
65•robitec97•1w ago

Comments

w4yai•1w ago
> It proves that modern LLMs can run without Python, PyTorch, or GPUs.

Did we need any proof of that ?

skybrian•1w ago
Knowing the performance is interesting. Apparently it's 1-3 tokens/second.
kgeist•1w ago
ikllama.cpp is a fork of llama.cpp which specializes on CPU inference, some benchmarks from 1 year ago: https://github.com/ikawrakow/ik_llama.cpp/discussions/164
jasonjmcghee•1w ago
I guess llama.cpp isn't quite as popular as I had assumed.
avadodin•1w ago
llama.cpp being the best choice doesn't make it popular.

When I got started, I was led to ollama and other local-llm freemium.

I didn't necessarily assume that they weren't c++(I don't even know) but I do think that –as implied– Python duct-tape solutions are more popular than llama.cpp.

tolerance•1w ago
I imagine so regarding GPUs, right? Is this is a legitimate project then doesn’t it provide a proof of concept for performance constraints that relate to them? Couldn't the environmentally concerned take this as an indicator that the technology can progress without relying on as much energy is potentially spent now? Shouldn’t researchers in the industry be thinking of ways to prevent the future capabilities of the technology from outrunning the capacity of the infrastructure?

I know very little about AI but these are things that come to mind here for me.

yorwba•1w ago
GPUs are more efficient than CPUs for LLM inference, using less energy per token and being cheaper overall. Yes, a single data center GPU draws a lot of power and costs a fortune, but it can also serve a lot more people in the time your CPU or consumer GPU needs to respond to a single prompt.
tolerance•1w ago
I got you, thanks!
jdefr89•1w ago
Python and PyTorch all call out to C libraries… I don’t get what he means by “proving LLMs can run without Python and PyTorch” at all. Seems like they don’t understand basic fundamentals about things here…
christianqchung•1w ago
A bizarre claim like that would be what happens when you let an LLM write the README without reading it first.
austinvhuang•1w ago
My first implementation of gemma.cpp was kind of like this.

There's such a massive performance differential vs. SIMD though that I learned to appreciate SIMD (via highway) as one sweet spot of low-dependency portability that sits between C loops and the messy world of GPUs + their fat tree of dependencies.

If anyone want to learn the basics - whip out your favorite LLM pair programmer and ask it to help you study the kernels in the ops/ library of gemma.cpp:

https://github.com/google/gemma.cpp/tree/main/ops

janwas•1w ago
:D Your code was nicely written and it was a pleasure to port to SIMD because it was already very data-parallel.
behnamoh•1w ago
but why tho? next gemma is coming and no one uses gemma 3 in prod anyway.
NitpickLawyer•1w ago
> no one uses gemma 3 in prod anyway.

Umm, we do. It's still one of the best for eu countries support / help chatbot style. It's got good (best?) multilingual support ootb, it's very "safe" (won't swear, won't display chinese characters, etc) and it's pretty fast.

behnamoh•1w ago
but it lacks system prompt support.
NitpickLawyer•1w ago
It lacks a deducated system prompt, but it was trained with and in practice works with the system prompt be the first message from the user.
gunalx•1w ago
Yep. Before gemma3 we where struggling with multilinguality on smaller European languages, and it is still one of the batter ones in that regard (even large open or closed models struggle with this to some extent). Gemma3 also is still pretty decent multi modal wise.
avadodin•1w ago
I didn't know this was a thing until I read this thread but I can confirm that it does fine(not perfect by any means just like the average casual non-native fluent speaker) and it is one of the reasons I use it as my local model.
uncognic•1w ago
I think /* */ single-line comments is a pretty good indication.
data-ottawa•1w ago
Gemma3 is probably the best supported fine tunable model.
austinvhuang•1w ago
I don't have firsthand knowledge, but r/SesameAI seems to believe Maya/Miles products are based on a Gemma3 backbone.
rao-v•1w ago
I'm really charmed by this project (I know there are a few like it).

In particular it's got a single ~600 line file (https://github.com/robitec97/gemma3.c/blob/main/gemma3_kerne...) with a clear straightforward implementation of every major function used in inferencing (google's models) from gelu to rope.

I'm curious how many more functions you'd need to add to have full coverage of every publically available LLM innovation (e.g. QK-Norm from Qwen3, SwiGLU etc.).

Obviously llama.cpp has a much bigger library but it's lovely to see everything in one clean file.

pacman1337•1w ago
Anyone using this model for something useful? For now I only have use cases for top performing models...

Learning to Reason in 13 Parameters

https://arxiv.org/abs/2602.04118
1•nicholascarolan•25s ago•0 comments

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

https://arxiv.org/abs/2601.22389
1•energyscholar•39s ago•1 comments

Ask HN: Will GPU and RAM prices ever go down?

1•alentred•1m ago•0 comments

From hunger to luxury: The story behind the most expensive rice (2025)

https://www.cnn.com/travel/japan-expensive-rice-kinmemai-premium-intl-hnk-dst
1•mooreds•1m ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
3•mindracer•2m ago•0 comments

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

https://www.wsj.com/finance/currencies/a-new-crypto-winter-is-here-and-even-the-biggest-bulls-are...
1•thm•2m ago•0 comments

Moltbook was peak AI theater

https://www.technologyreview.com/2026/02/06/1132448/moltbook-was-peak-ai-theater/
1•Brajeshwar•3m ago•0 comments

Why Claude Cowork is a math problem Indian IT can't solve

https://restofworld.org/2026/indian-it-ai-stock-crash-claude-cowork/
1•Brajeshwar•3m ago•0 comments

Show HN: Built an space travel calculator with vanilla JavaScript v2

https://www.cosmicodometer.space/
1•captainnemo729•4m ago•0 comments

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b
1•Brajeshwar•4m ago•0 comments

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

https://iocombats.com/blogs/micro-frontends-in-2026
1•ghazikhan205•6m ago•0 comments

These White-Collar Workers Actually Made the Switch to a Trade

https://www.wsj.com/lifestyle/careers/white-collar-mid-career-trades-caca4b5f
1•impish9208•6m ago•1 comments

The Wonder Drug That's Plaguing Sports

https://www.nytimes.com/2026/02/02/us/ostarine-olympics-doping.html
1•mooreds•7m ago•0 comments

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

https://new.knife.day/blog/reddit-steel-sentiment-analysis
1•p-s-v•7m ago•0 comments

Federated Credential Management (FedCM)

https://ciamweekly.substack.com/p/federated-credential-management-fedcm
1•mooreds•7m ago•0 comments

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

https://app.writtte.com/read/kZ8Kj6R
1•lasgawe•7m ago•1 comments

The Story of Heroku (2022)

https://leerob.com/heroku
1•tosh•8m ago•0 comments

Obey the Testing Goat

https://www.obeythetestinggoat.com/
1•mkl95•8m ago•0 comments

Claude Opus 4.6 extends LLM pareto frontier

https://michaelshi.me/pareto/
1•mikeshi42•9m ago•0 comments

Brute Force Colors (2022)

https://arnaud-carre.github.io/2022-12-30-amiga-ham/
1•erickhill•12m ago•0 comments

Google Translate apparently vulnerable to prompt injection

https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-ba...
1•julkali•12m ago•0 comments

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

https://bsky.app/profile/fullmoon.id/post/3meadfaulhk2s
1•todsacerdoti•13m ago•0 comments

Software development is undergoing a Renaissance in front of our eyes

https://twitter.com/gdb/status/2019566641491963946
1•tosh•13m ago•0 comments

Can you beat ensloppification? I made a quiz for Wikipedia's Signs of AI Writing

https://tryward.app/aiquiz
1•bennydog224•15m ago•1 comments

Spec-Driven Design with Kiro: Lessons from Seddle

https://medium.com/@dustin_44710/spec-driven-design-with-kiro-lessons-from-seddle-9320ef18a61f
1•nslog•15m ago•0 comments

Agents need good developer experience too

https://modal.com/blog/agents-devex
1•birdculture•16m ago•0 comments

The Dark Factory

https://twitter.com/i/status/2020161285376082326
1•Ozzie_osman•16m ago•0 comments

Free data transfer out to internet when moving out of AWS (2024)

https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-internet-when-moving-out-of-aws/
1•tosh•17m ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•alwillis•18m ago•0 comments

Prejudice Against Leprosy

https://text.npr.org/g-s1-108321
1•hi41•19m ago•0 comments