Show HN: OfflineLLM: Live Voice Chat with DeepSeek, Llama on iOS and VisionOS

https://offlinellm.bilaal.co.uk/

4•bilaal_dc5631•1d ago

Hi, this is something I've been working on for the past 18 months. There are an abundance of tools to run LLMs locally on desktops (e.g. ollama, LM Studio), but other devices have been left out. This is has been a project to run these models onto iOS and visionOS, which has turned out to work really well. Even an iPhone 14 Pro can quite easily run the 3B parameter version of Llama 3.2. CLIP models also work well too!

It also has a Live Voice Chat which gives a 2-way conversation experience, functionality similar to the cloud-based Gemini Live feature that Google offers.

Under the hood it can run most GGUF models, using a heavily forked and diverged verison of llama.cpp which has helped performance on the mobile devices.

The next steps are to integrate Apple's on device 3B model which hopefully they will be opening up access to at WWDC in a week's time. I'm also in the middle of adding in support for Gemma 3 and Qwen 3.

Let me know what you think!

Comments

35jelly35•1d ago

> Even an iPhone 14 Pro can quite easily run the 3B parameter version of Llama 3.2

Wow. I never thought a non-Apple Intelligence phone would be able to run this. Does the phone get hot at all?

Also, how long did it take you to build this and how easy is it to test this in Xcode?

bilaal_dc5631•1d ago

Thanks for the questions.

> Does the phone get hot at all?

It's pretty reasonable and similar to the heat you'll get when playing an intensive game. If you're sensible it's pretty usable.

> how long did it take you to build this

I first started in 2023 and managed to get an MVP out the same year. That was pretty basic and a lot of work has been done since. I don't have an accurate measure of how much time has been spent, but it's had a lot of my attention since I released the first MVP.

> how easy is it to test this in Xcode?

This is pretty nice actually. It runs absolutely fine in the simulator, which is where I do most of my testing. The only time I have to move to a physical device is for performance testing, which isn't a huge drain on productivity.

Show HN: I made a browser extension to view local times for standard timezones

OpenDNS systematically blocking legitimate businesses with broken appeal process

Plutonium Mountain: The 17-year mission to guard remains of Soviet nuclear tests

There should be no Computer Art (1971)

Allegation: Indian programmers were behind AI chatbot

The EU's "Encryption Roadmap" Makes Everyone Less Safe

Forecasting: Principles and Practice, the Pythonic Way

The Shape of the Essay Field

Wendelstein 7-X sets new fusion record

Using jemalloc to get to the bottom of a memory leak (2015)

(Preprint) Proof of Existence and Mass Gap for SU(3) Yang-Mills in 4D Space-Time

A teen died after being blackmailed with A.I.-generated nudes

EU Commission refuses to disclose authors behind its mass surveillance proposal

Ask HN: What's Your Spirituality?

AI hype fuels pay rise – but only if you're in the right gig

Vatican Library manuscripts to be restored and digitized

SwiftUI in 2025: Forget MVVM

Analyst Suggests Apple Might Be Considering Buying Unity After Legal Defeat

A Fun Way to Learn Programming Using Python Turtle Graphics

Show HN: SocialHQ – An AI Ghostwriter for LinkedIn for Founders

CVE-2025-4143

Show HN: Offloading GPU Workloads from Kubernetes to RunPod via Virtual Kubelet

My First Month with Math Academy

AI didn't kill Stack Overflow

MIT Announces the Initiative for New Manufacturing

Show HN: gsum – Incremental Checksums on 20 Algos, 8 OSes – Vibe Coded

Apex announces Comet satellite bus for constellations – SpaceNews

U.S. sanctions may be inadvertently accelerating China's semiconductor ambitions

'Crazy' data rules hit German plans to boost army reserve

Bristol Myers makes $11B deal with BioNTech to join the cancer-drug race