frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Launch HN: Cactus (YC S25) – AI inference on smartphones

https://github.com/cactus-compute/cactus
37•HenryNdubuaku•1h ago
Hey HN, Henry & Roman here, we are building Cactus (https://cactuscompute.com/), an AI inference engine specifically designed for phones.

We're seeing a major push towards on-device AI, and for good reason: on-device AI decreases latency from >1sec to <100ms, guarantees privacy by default, works offline, and doesn't rack up a massive API bill at scale.

Also, tools and agentic designs make small models really good beyond benchmarks. This has been corroborated by other papers like https://arxiv.org/abs/2506.02153, and we see model companies like DeepMind aggressively going into smaller models with Gemma3 270m and 308m. We found Qwen3 600m to be great at tool calls for instance.

Some frameworks already try to solve this but in my previous job, they struggled in production compared to research and playgrounds:

- They optimise for modern devices but 70% of phones today are low-mid budget.

- Bloated app bundle sizes and battery drain are serious concerns for users.

- Phone GPU battery drain is unacceptable, NPUs are preferred, but few phones have those for now.

- Some are platform-specific, requiring different models and workflows for different operating systems.

At Cactus, we’ve written kernels and inference engine for running AI locally on any phone, from the ground-up.

Cactus is designed for mobile devices and their constraints. Every design choice like energy efficiency, accelerator support, quantization levels, supported models, weight format, and context management were determined by this. We also provide minimalist SDKs for app developers to build agentic workflows in 2-5 lines of code.

We made a Show HN post when we started the project to get the community's thoughts (https://news.ycombinator.com/item?id=44524544). Based on your feedback, we built Cactus bottom-up to solve those problems, and are launching the Cactus Kernels, Cactus Graph and Cactus Engine, all designed for phones and tiny devices.

CPU benchmarks for Qwen3-600m-INT8 :

- 16-20 toks/sec on Pixel 6a / Galaxy S21 / iPhone 11 Pro

- 50-70 toks/sec on Pixel 9 / Galaxy S25 / iPhone 16.

- Time-to-first-token is as low as 50ms depending on prompt size.

On NPUs, we see Qwen3-4B-INT4 run at 21 toks/sec.

We are open-source (https://github.com/cactus-compute/cactus). Cactus is free for hobbyists and personal projects, with a paid license required for commercial use.

We have a demo app on the App Store at https://apps.apple.com/gb/app/cactus-chat/id6744444212 and on Google Play at https://play.google.com/store/apps/details?id=com.rshemetsub....

In addition, there are numerous apps using Cactus in production, including AnythingLLM (https://anythingllm.com/mobile) and KinAI (https://mykin.ai/). Collectively they run over 500k weekly inference tasks in production.

While Cactus can be used for all Apple devices including Macbooks due to their design, for computers/AMD/Intel/Nvidia generally, please use HuggingFace, Llama.cpp, Ollama, vLLM, MLX. They're built for those, support x86, and are all great!

Thanks again, please share your thoughts, we’re keen to understand your views.

Comments

VladVladikoff•1h ago
How does this startup plan to make money?
HenryNdubuaku•1h ago
Cactus is free for hobbyists and personal projects, but we charge a tiny fee for commercial use which comes with more features that are relevant for enterprises.
mritchie712•1h ago
how many GB does an app packaged with Qwen3 600m + Cactus take up?

e.g. if I built a basic LLM chat app with Qwen3 600m + Cactus, whats the total app size?

HenryNdubuaku•1h ago
400mb if you ship the model as an asset. However, you can also build the app to download the model post-install, Cactus SDKs support this, as well as agentic workflows you’d need.
cco•1h ago
I've been using Cactus for a few months, great product!

Makes it really easy to plug and play different models on my phone.

If anybody is curious what a Pixel 9 Pro is capable of:

Tokens: 277- TTFT: 1609ms 9 tok/sec

qwen2.5 1.5b instruct q6_k

Sure, here's a simple implementation of the Bubble Sort algorithm in Python:

def bubble_sort(arr): n = len(arr) for i in range(n): # Flag to detect any swap in current pass swapped = False for j in range(0, n-i-1): # Swap if the element found is greater than the next element if arr[j] > arr[j+1]: arr[j], arr[j+1] = arr[j+1], arr[j] swapped = True # If no swap occurs in the inner loop, the array is already sorted if not swapped: break

# Example usage: arr = [64, 34, 25, 12, 22, 11, 90] bubble_sort(arr) print("Sorted array is:", arr)

This function sorts the array in ascending order using the Butbble Sort algorithm. The outer loop runs n times, where n is the length of the array. The inner loop runs through the array, comparing adjacent elements and swapping them if they are in the wrong order. The swapped flag is used to detect if any elements were swapped in the current pass, which would indicate that the array is already sorted and can be exited early.

HenryNdubuaku•1h ago
Thanks for the kind words, we’ve improved performance now actually, follow the instructions on the core repo.

Same model should run 3x faster on the same phone.

These improvements are still being pushed to the SDKs though.

dcreater•1h ago
The first picture on the android app store page shows Claude Haiku as the model
HenryNdubuaku•1h ago
Thanks for noticing! The app is just a demo for the framework, so devs can compare the open-source models against frontier Cloud models and make a decision. We removed the comparison now so those screenshots indeed has to be updated.
dcreater•1h ago
Does it incorporate web search tool?
HenryNdubuaku•59m ago
It can incorporate any tool you want at all. This company’s app use exactly that feature, you can download and get a sense of it before digging in. https://anythingllm.com/mobile
pzo•56m ago
FWIW They change license 2 weeks ago from Apache 2.0 to non commercial. Understand they need to pay the bills but lost trust with such move. Will stick with react-native-ai [0] that is extension of vercel aisdk but with also local inference on edge devices

[0] react-native-ai.dev

HenryNdubuaku•50m ago
Understandable, though to explain, Cactus is still free for personal & small projects if you fall into that category. We’re early and would definitely consider your concerns on license in our next steps, thanks.
mdaniel•19m ago
For fear of having dang show up and scold me, I'll just add the factual statement that I will never ever believe any open source claim in any Launch HN ever. I can now save myself the trouble of checking, because I can be certain it's untrue

I already knew to avoid "please share your thoughts," although I guess I am kind of violating that one by even commenting

HenryNdubuaku•12m ago
It’s absolutely fine to share your thoughts, that’s the point of this post, we want to understand where people’s heads are at, it’s what determines our next decisions. What do you really think? I’m genuinely asking so I don’t think mods will react.
observationist•48m ago
Open source for the PR, then switching to non-open licensing is a cowardly, bullshit move.

https://github.com/cactus-compute/cactus/commit/b1b5650d1132...

Use open source and stick with it, or don't touch it at all, and tell any VC shitheels saying otherwise to pound sand.

If your business is so fragile or unoriginal that it can't survive being open source, then it will fail anyway. If you make it open source, embrace the ethos and build community, then your product or service will be stronger for it. If the big players clone your work, you get instant underdog credibility and notoriety.

HenryNdubuaku•29m ago
Thanks for sharing your thoughts. Honestly, I’d be annoyed too and it might sound like an excuse, but our circumstance was quite unique, it was a difficult decision at that time being an open-source contributor myself.

It’s still free for the community, just that corporations need a license. Should we make this clearer in the license?

typpilol•3m ago
Yes.

Just say that in the license.

Nvidia buys $5B in Intel

https://www.tomshardware.com/pc-components/cpus/nvidia-and-intel-announce-jointly-developed-intel...
426•stycznik•6h ago•267 comments

Slack has raised our charges by $195k per year

https://skyfall.dev/posts/slack
2418•JustSkyfall•15h ago•1027 comments

Launch HN: Cactus (YC S25) – AI inference on smartphones

https://github.com/cactus-compute/cactus
38•HenryNdubuaku•1h ago•17 comments

TernFS – An exabyte scale, multi-region distributed filesystem

https://www.xtxmarkets.com/tech/2025-ternfs/
104•rostayob•2h ago•24 comments

American Prairie unlocks another 70k acres in Montana

https://earthhope.substack.com/p/victory-for-public-access-american
135•mooreds•1h ago•82 comments

Luau – Fast, small, safe, gradually typed scripting language derived from Lua

https://luau.org/
85•andsoitis•3h ago•29 comments

Flipper Zero Geiger Counter

https://kasiin.top/blog/2025-08-04-flipper_zero_geiger_counter_module/
120•wgx•4h ago•35 comments

KDE is now my favorite desktop

https://kokada.dev/blog/kde-is-now-my-favorite-desktop/
468•todsacerdoti•5h ago•383 comments

Aaron Levie: Startups win in the AI era [video]

https://www.youtube.com/watch?v=uqc_vt95GJg
11•sandslash•3h ago•0 comments

Grief gets an expiration date, just like us

https://bessstillman.substack.com/p/oh-fuck-youre-still-sad
228•LaurenSerino•3h ago•103 comments

The quality of AI-assisted software depends on unit of work management

https://blog.nilenso.com/blog/2025/09/15/ai-unit-of-work/
92•mogambo1•4h ago•58 comments

Configuration files are user interfaces

https://ochagavia.nl/blog/configuration-files-are-user-interfaces/
20•todsacerdoti•51m ago•3 comments

Midcentury North American Restaurant Placemats

https://casualarchivist.substack.com/p/order-up
119•NaOH•1d ago•29 comments

Automatic differentiation can be incorrect

https://www.stochasticlifestyle.com/the-numerical-analysis-of-differentiable-simulation-automatic...
47•abetusk•3h ago•14 comments

CERN Animal Shelter for Computer Mice (2011)

https://computer-animal-shelter.web.cern.ch/index.shtml
265•EbNar•10h ago•35 comments

Rereading books

https://maxgirkins.com/writings/on-rereading
23•mgirkins•3d ago•3 comments

Show HN: The text disappears when you screenshot it

https://unscreenshottable.vercel.app/?text=Hello
459•zikero•15h ago•161 comments

This Website Has No Class

https://aaadaaam.com/notes/no-class/
167•robin_reala•8h ago•75 comments

Meta Ray-Ban Display

https://www.meta.com/blog/meta-ray-ban-display-ai-glasses-connect-2025/
562•martpie•17h ago•831 comments

Pnpm has a new setting to stave off supply chain attacks

https://pnpm.io/blog/releases/10.16
153•ivanb•10h ago•101 comments

Linking to text fragments with a bookmarklet

https://alexwlchan.net/2025/text-fragments-bookmarklet/
8•Bogdanp•3d ago•1 comments

CircuitHub (YC W12) Is Hiring Operations Research Engineers (UK/Remote)

https://www.ycombinator.com/companies/circuithub/jobs/UM1QSjZ-operations-research-engineer
1•seddona•8h ago

Chrome's New AI Features

https://blog.google/products/chrome/new-ai-features-for-chrome/
20•HieronymusBosch•22m ago•4 comments

Fast Fourier Transforms Part 1: Cooley-Tukey

https://connorboyle.io/2025/09/11/fft-cooley-tukey.html
66•signa11•8h ago•15 comments

Mirror Life Worries

https://www.science.org/content/blog-post/mirror-life-worries
34•etiam•6h ago•14 comments

A better future for JavaScript that won't happen

https://drewdevault.com/2025/09/17/2025-09-17-An-impossible-future-for-JS.html
36•warrenm•2h ago•16 comments

One Token to rule them all – Obtaining Global Admin in every Entra ID tenant

https://dirkjanm.io/obtaining-global-admin-in-every-entra-id-tenant-with-actor-tokens/
291•colinprince•18h ago•42 comments

Boring is good

https://jenson.org/boring/
273•zdw•2d ago•60 comments

Keeping SSH sessions alive with systemd-inhibit

https://kd8bny.com/posts/session_inhibit/
45•kd8bny•2d ago•16 comments

A postmortem of three recent issues

https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
358•moatmoat•20h ago•112 comments