frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: We made our own inference engine for Apple Silicon

https://github.com/trymirai/uzu
114•darkolorin•8h ago
We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel and platform.

Comments

sharifulin•5h ago
Wow! Sounds super interesting
slavasmirnov•5h ago
that’s exactly we are looking for not to waste on apis. Wonder how significant trade offs are
TheMagicHorsey•4h ago
Amazing!

How was your experience using Rust on this project? I'm considering a project in an adjacent space and I'm trying to decide between Rust, C, and Zig. Rust seems a bit burdensome with its complexity compared to C and Zig. Reminds me of C++ in its complexity (although not as bad). I find it difficult to walk through and understand a complicated Rust repository. I don't have that problem with C and Zig for the most part.

But I'm wondering if I just need to invest more time in Rust. How was your learning curve with the language?

adastra22•4h ago
You are confusing familiarity with intrinsic complexity. I have 20 years experience with C/C++ before switching to rust a few years ago. After the initial hurdle, it is way easier and very simple to follow.
ednevsky•4h ago
nice
ewuhic•4h ago
>faster than llama cpp in all of the use cases

What's your deliberate, well-thought roadmap for achieving adoption similar to llama cpp?

pants2•4h ago
Probably getting acquired by Apple :)
mintflow•4h ago
just curios, will it be supported on iOS, it would be great to build local llm app with this project.
AlekseiSavin•4h ago
already) https://github.com/trymirai/uzu-swift
cwlcwlcwlingg•4h ago
Wondering why use Rust other than C++
adastra22•4h ago
Why use C++?
bee_rider•2h ago
I wonder why they didn’t use Fortran.
giancarlostoro•1h ago
...or D? or Go? or Java? C#? Zig? etc they chose what they were most comfortable with. Rust is fine, it's not for everyone clearly, but those who use it produce high quality software, I would argue similar with Go, without all the unnecessary mental overhead of C or C++
outworlder•1h ago
Why use C++ for greenfield projects?
greggh•4h ago
"trymirai", every time I hear the word Mirai I think of the large IOT DDoS botnet. Maybe it's just me though.
fnord77•57m ago
I think of the goofy Toyota fuel cell car. I think a grand total of about 6 have been sold (leased) in california
rnxrx•4h ago
I'm curious about why the performance gains mentioned were so substantial for Qwen vs Llama?
AlekseiSavin•3h ago
it looks like llama.cpp has some performance issues with bf16
homarp•4h ago
Can you explain the type of quantization you support?

would https://docs.unsloth.ai/basics/kimi-k2-how-to-run-locally be faster with mirai?

AlekseiSavin•3h ago
right now, we support AWQ but are currently working on various quantization methods in https://github.com/trymirai/lalamo
smpanaro•4h ago
In practice, how often do the models use the ANE? It sounds like you are optimizing for speed which in my experience always favors GPU.
AlekseiSavin•3h ago
You're right, modern edge devices are powerful enough to run small models, so the real bottleneck for a forward pass is usually memory bandwidth, which defines the upper theoretical limit for inference speed. Right now, we've figured out how to run computations in a granular way on specific processing units, but we expect the real benefits to come later when we add support for VLMs and advanced speculative decoding, where you process more than one token at a time
J_Shelby_J•3h ago
VLMs = very large models?
mmorse1217•3h ago
Probably vision language models.
skybrian•3h ago
What are the units on the benchmark results? I’m guessing higher is better?
AlekseiSavin•3h ago
yeah, tokens per second
dcreater•3h ago
Somewhat faster on small models. Requires new format.

Not sure what the goal is for this project? Not seeing how this presents adequate benefits to get adopted by the community

koakuma-chan•3h ago
Written in Rust is a big one for me.
worldsavior•2h ago
It's utilizing Apple ANE and probably other optimization tools provided by Apple's framework. Not sure if llama.cpp uses them, but if they're not then the benchmark on GitHub says it all.
zdw•3h ago
How does this bench compared to MLX?
jasonjmcghee•3h ago
I use MLX in lmstudio and it doesn't have whatever issues llama cpp is showing here.

Qwen3-0.6B at 5 t/s doesn't make any sense. Something is clearly wrong for that specific model.

giancarlostoro•1h ago
Hoping the author can answer, I'm still learning about how this all works. My understanding is that inference is "using the model" so to speak. How is this faster than established inference engines specifically on Mac? Are models generic enough that if you build e.g. an inference engine focused on AMD GPUs or even Intel GPUs, would they achieve reasonable performance? I always assumed because Nvidia is king of AI that you had to suck it up, or is it just that most inference engines being used are married to Nvidia?

I would love to understand how universal these models can become.

nodesocket•32m ago
I just spun up a AWS EC2 g6.xlarge instance to do some llm work. The GPU is NVIDIA L4 24GB and costs $0.8048/per hour. Starting to think about switching to an Apple mac2-m2.metal instance for $0.878/ per hour. Big question is the Mac instance only has 24GB of unified memory.
floam•3m ago
How does this compare to https://github.com/Anemll/Anemll?

Asking Gemini to write some cube code

https://gist.github.com/izabera/788000830d104fa75cebc58e443239e3
1•todsacerdoti•51s ago•0 comments

Go-CDC-chunkers: chunk and deduplicate everything

https://plakar.io/posts/2025-07-11/introducing-go-cdc-chunkers-chunk-and-deduplicate-everything/
4•Bogdanp•7m ago•0 comments

Morita Therapy

https://en.wikipedia.org/wiki/Morita_therapy
1•klaussilveira•8m ago•0 comments

Breaking Up with Redis: A Comparative Study of Its Open-Source Alternatives

http://www.diva-portal.org/smash/record.jsf?pid=diva2%253A1969436&dswid=8751
1•PKop•11m ago•1 comments

Hierarchical Modeling (H-Nets)

https://cartesia.ai/blog/hierarchical-modeling
2•marviel•13m ago•0 comments

How AI on Microcontrollers Actually Works: Registering Operators

https://danielmangum.com/posts/ai-microcontrollers-registering-operators/
2•hasheddan•13m ago•0 comments

AI/ML engineers as the new pro athletes

https://www.theringer.com/2025/07/14/tech/artificial-intelligence-summer-of-free-agency-meta-openai
2•driftsumi-e•15m ago•0 comments

A quick solution for Cursor MCP security vulnerabilities

https://www.generalanalysis.com/blog/mcpguard
1•rhavaeis•15m ago•0 comments

The return-to-office reality gap

https://time.com/charter/7289256/the-return-to-office-reality-gap/
1•ljosa•16m ago•0 comments

How the Deep Sea Cables That Power the World Are Made

https://www.nytimes.com/2025/07/14/business/undersea-power-cables-electricity.html
2•bookofjoe•17m ago•1 comments

Ask HN: Discussion forums like HN for Sales and Marketing?

1•yumlogic•17m ago•0 comments

It turns out Tesla Canada's shady $43M incentive grab was above-board after all

https://electrek.co/2025/07/14/it-turns-out-tesla-canadas-shady-43m-incentive-grab-was-above-board-after-all/
1•josephcsible•18m ago•0 comments

The Scoop on Jeffrey Epstein by Alan Dershowitz

https://www.wsj.com/opinion/the-inside-scoop-on-jeffrey-epstein-b0da1cbe
2•giardini•19m ago•1 comments

Human Stigmergy: The world is my task list

https://aethermug.com/posts/human-stigmergy
1•Petiver•21m ago•0 comments

Ask HN: How to attract top engineers as an early-stage startup?

1•hubraumhugo•22m ago•2 comments

Hack, a native Hacker News client for iPhone, Mac and Android

https://apps.apple.com/us/app/hack-for-hacker-news-yc-reader/id1464477788
1•busymom0•24m ago•0 comments

Hazel: A live functional programming environment with typed holes

https://github.com/hazelgrove/hazel
1•azhenley•25m ago•0 comments

A new agentic IDE by AWS

https://kiroai.net
1•ri-vai•26m ago•1 comments

2D Slices of 3D Gaussians

https://d2x313g9lpht1q.cloudfront.net/original/3X/d/2/d2aa9cf85ac08a3ccdd2edc10829f4122fd52eeb.gif
1•dtschump•28m ago•1 comments

Real-time email delivery metrics across major providers

https://groups.io/email-provider-status
1•speckx•28m ago•0 comments

Helix Editor Release 25.07 Highlights

https://helix-editor.com/news/release-25-07-highlights/
39•matrixhelix•31m ago•10 comments

Bookshop's Founder Raised 39M+ for Small Businesses

https://www.entrepreneur.com/starting-a-business/how-bookshops-founder-raised-39m-for-small-businesses/494429
3•greenie_beans•33m ago•0 comments

Underwriting Superintelligence

https://underwriting-superintelligence.com/
3•brdd•34m ago•1 comments

Will AI take your job?

https://theconversation.com/will-ai-take-your-job-the-answer-could-hinge-on-the-4-ss-of-the-technologys-advantages-over-humans-258469
2•worik•36m ago•0 comments

Thoughts on the Future of Stream Processing

https://www.epsio.io/blog/on-the-future-of-data-streaming
5•gikl•36m ago•3 comments

Eco-friendly plastic offers flexible electronic properties without PFAS

https://phys.org/news/2025-07-eco-friendly-plastic-flexible-electronic.html
1•PaulHoule•36m ago•0 comments

In re Facebook, Inc. Deriv. Litig. Opinion imposing sanctions for spoilation

https://courts.delaware.gov/Opinions/Download.aspx?id=374350
2•1vuio0pswjnm7•39m ago•0 comments

The Shining: my trip to the G7 horror show with Emmanuel Macron

https://www.theguardian.com/news/2025/jul/15/my-trip-g7-summit-emmanuel-macron-emmanuel-carrere
1•ciconia•41m ago•0 comments

Speeding up compilation with `hint-mostly-unused`

https://blog.rust-lang.org/inside-rust/2025/07/15/call-for-testing-hint-mostly-unused/
1•ingve•43m ago•0 comments

North America's Oldest Known Pterosaur

https://www.si.edu/newsdesk/releases/smithsonian-led-team-discovers-north-americas-oldest-known-pterosaur
2•gmays•46m ago•0 comments