I’ve been working on llmedge, a Kotlin-first Android library for running AI models fully on-device. It started as a wrapper around llama.cpp, but grew into a more general toolkit for local inference on mobile.
Today it supports:
- LLMs via GGUF models
- Image generation and vision models
- Speech-to-text and text-to-speech
- Streaming inference and Android-friendly memory handling
The focus is on making this usable inside real Android apps. Native engines are wrapped behind a Kotlin API, model downloads avoid the Java heap, and memory usage is constrained based on the device to prevent crashes. The goal isn’t to ship a UI or an app, but to give developers a way to embed local AI features without dealing directly with JNI, NDK builds, or per-model native glue.
I’d be interested in feedback from anyone who’s worked on on-device inference, especially around performance trade-offs and memory management on mobile.
aatricks•8h ago
Today it supports: - LLMs via GGUF models - Image generation and vision models - Speech-to-text and text-to-speech - Streaming inference and Android-friendly memory handling
The focus is on making this usable inside real Android apps. Native engines are wrapped behind a Kotlin API, model downloads avoid the Java heap, and memory usage is constrained based on the device to prevent crashes. The goal isn’t to ship a UI or an app, but to give developers a way to embed local AI features without dealing directly with JNI, NDK builds, or per-model native glue.
Core library: https://github.com/Aatricks/llmedge
Examples: https://github.com/Aatricks/llmedge-examples
Embedded usage example: https://github.com/Aatricks/EasyReader
I’d be interested in feedback from anyone who’s worked on on-device inference, especially around performance trade-offs and memory management on mobile.