frontpage.

Running a 270M LLM on Android (architecture and benchmarks)

1•ayushranjan99•18m ago

I’ve been experimenting with running small LLMs directly on mobile hardware (low-range Android devices), without relying on cloud inference. This is a summary of what worked, what didn’t, and why.

Cloud-based LLM APIs are convenient, but come with:

-latency from network round-trips -unpredictable API costs -privacy concerns (content leaving device) -the need for connectivity

For simple tasks like news summarization, small models seem “good enough,” so I tested whether a ~270M parameter model gemma3-270m could run entirely on-device.

Model - Gemma3-270M INT8 Quantized Runtime - Cactus SDK (Android NPU/GPU acceleration) App Framework - Flutter Device - Mediatek 7300 with 8GB RAM

Architecture - User shares a URL to the app (Android share sheet). - App fetches article HTML → extracts readable text. - Local model generates a summary. - device TTS reads the summary. Everything runs offline except the initial page fetch.

Performace - ~450–900ms Latency for a short summary (100–200 tokens). - On devices without NPU acceleration, CPU-only inference takes 2–3× longer. - Peak RAM: ~350–450MB

Limitation -Quality is noticeably worse than GPT-5 for complex articles. -Long-form summarization (>1k words) gets inconsistent. -Web scraping is fragile for JS-heavy or paywalled sites. -Some low-end phones throttle CPU/GPU aggressively.

| Metric | Local (Gemma 270M) | GPT-4o Cloud | | ------- | -------------------- | -------------------- | | Latency | 0.5–1.5s | 0.7–1.5s + network | | Cost | 0 | API cost per request | | Privacy | Text stays on device | Sent over network | | Quality | Medium | High |

Github - https://github.com/ayusrjn/briefly

Running small LLMs on-device is viable for narrow tasks like summarization. For more complex reasoning tasks, cloud models still outperform by a large margin, but the “local-first” approach seems promising for privacy-sensitive or offline-first applications. Cactus SDK does a pretty good job for handling the model and accelarations.

Happy to answer Questions :)

Task Based Management [video]

Designing an Open Source Micro-Manipulator

'If I had colon cancer, I could grow my own tumor, and see which drug kills it'

Export messages and Legacy (Duo) call history

Tech Is Opt-In

People test Nano Banana with PDF paper to whiteboard. I did the exact opposite

Isochrone Curve

Show HN: Christmas Neovim Theme

America's WW2 Combat Drone That Bombed the Japanese – The TDR-1 [video]

HPC Is Not Just Riding the Coattails of AI

Updating the Golang Memory Model

Running a 270M LLM on Android (architecture and benchmarks)

Bayesian cohort-level ARPU Model

Picturing a Voice: Margaret Watts Hughes and the Eidophone

Accepting that you won't know it all

SC25: HACCing over 500 Petaflops on Frontier

How to write a great agents.md: Lessons from over 2,500 repositories

Show HN: Another JSON Alternative

Inflatable Space Stations

If the GenAI Bubble Bursts, Nvidia Will Still Keep Growing

Become the Consequence

WorldGen – Text to Immersive 3D Worlds

Pitch Multiplication (2017)

Boomtown: Futuristic DE Weapons Research Could Power Albuquerque NM

Analyzing Papers with Nano Banana Pro

User Identity Isn't Complete Without Authorization

Ask HN: Do developers need to follow every tech update?

Playtiles – stick-on electronic-free gamepad for phones

Top WordPress Alternatives

Kids who own smartphones before age 13 have worse mental health outcomes: Study

Running a 270M LLM on Android (architecture and benchmarks)

Task Based Management [video]

Designing an Open Source Micro-Manipulator

'If I had colon cancer, I could grow my own tumor, and see which drug kills it'

Export messages and Legacy (Duo) call history

Tech Is Opt-In

People test Nano Banana with PDF paper to whiteboard. I did the exact opposite

Isochrone Curve

Show HN: Christmas Neovim Theme

America's WW2 Combat Drone That Bombed the Japanese – The TDR-1 [video]

HPC Is Not Just Riding the Coattails of AI

Updating the Golang Memory Model

Running a 270M LLM on Android (architecture and benchmarks)

Bayesian cohort-level ARPU Model

Picturing a Voice: Margaret Watts Hughes and the Eidophone

Accepting that you won't know it all

SC25: HACCing over 500 Petaflops on Frontier

How to write a great agents.md: Lessons from over 2,500 repositories

Show HN: Another JSON Alternative

Inflatable Space Stations

If the GenAI Bubble Bursts, Nvidia Will Still Keep Growing

Become the Consequence

WorldGen – Text to Immersive 3D Worlds

Pitch Multiplication (2017)

Boomtown: Futuristic DE Weapons Research Could Power Albuquerque NM

Analyzing Papers with Nano Banana Pro

User Identity Isn't Complete Without Authorization

Ask HN: Do developers need to follow every tech update?

Playtiles – stick-on electronic-free gamepad for phones

Top WordPress Alternatives

Kids who own smartphones before age 13 have worse mental health outcomes: Study