frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Run LLMs locally in Flutter with <200ms latency

https://github.com/ramanujammv1988/edge-veda
31•rish2497•1h ago

Comments

rish2497•1h ago
I’m the creator of EdgeVeda. I spent the last few months obsessed with one problem: Why is mobile AI still so slow and expensive?

Most startups are just wrapping OpenAI/Claude/gemini APIs. This works for prototypes, but for production apps, the 1000ms+ roundtrip latency kills the UX, and the inference bills kill the margins.

I built EdgeVeda to be the "Switzerland of Edge AI." It’s a unified C++ engine that handles the hardware-specific "plumbing" (Metal for iOS, Vulkan/NNAPI for Android) so you can run LLMs, Whisper, and TTS locally in one line of code.

Key Technical Feats:

Sub-200ms Time-to-First-Token: Achieved by bypassing the standard Android JNI bottleneck and using a direct memory-mapped buffer.

The Memory Watchdog: Mobile OSs love to kill apps that use >1GB RAM. I implemented a custom allocator that swaps model layers to disk when the system is under pressure.

Unified Pipeline: Orchestrates STT -> LLM -> TTS entirely on-device.

I’m looking for feedback on:

My implementation of the Dart FFI bridge (any performance leaks I missed?).

Support for 2024-era NPUs on non-flagship Android devices.

I'll be here all day to answer technical questions.

refulgentis•52m ago
This is LLM-generated-slop.

The repo still has empty react-native/kotlin projects that were supposed to exist.

It doesn't actually have Metal/Vulkan/NNAPI support, just, an enum for it. (search the repo, I'm serious)

Then another 100 things, not worth listing them out. Except one more I guess, there's ~0 chance of 200 ms TTFT locally, even if they had what they claimed. (modulo stilted scenarios like, only 5 token prompt on desktop-class GPU with 3B model)

Surprised to see it at #2 on the front page.

If you're a developer looking to do local LLMs in Flutter, might as well plug my 2-3 year old project that's still humming, https://github.com/Telosnex/fllama.

It's built on top of llama.cpp and is, well, actually real. And works on every platform, Android, iOS, macOS, Windows, Linux. Web uses MLC, because llama.cpp in WASM is way too slow, WebGPU is slower (it's early). MLC is ~dead, so that's not good, but...whatever. No better option on web currently.

(cheers to you, noble Icarus. I don't mean to make you feel bad, but, you're not going to Claude Code your way to what you want in 2 weeks. I wish. You basically are claiming to have built faster versions of llama.cpp, and ONNX, on every platform with custom accelerators, from scratch, and built innumerable features on top, by yourself, with just Claude Code, in 2 weeks.)

rish2497•15m ago
you'e 100% right to call this out, and I appreciate the deep dive.

To be completely transparent: I’ve over-indexed on the vision and the architecture in this repo rather than the functional implementation. The current state of the code is effectively a "spec-in-code" and a skeleton of the architecture I am building toward, rather than the production-ready engine my post implied.

The "LLM-generated-slop" comment hits home because I have been using AI tools heavily to scaffold the cross-platform boilerplate (the enums, the FFI bridges, and the project structures). In my excitement to show the "unified pipeline" vision, I pushed a version that is essentially a hollow shell of stubs.

Specifics on your points:

Empty projects: Correct. These are placeholders in the current monorepo structure.

Hardware Enums: You caught the stub. I am currently working on the actual Metal/Vulkan integration layers in a private branch, but I mistakenly pushed the "public skeleton" as if it were the finished core.

200ms TTFT: This is our internal target based on local benchmarks with raw llama.cpp implementations, but as you noted, it is currently "undefined" in the public Flutter wrapper because the bridge isn't actually moving tokens yet.

I genuinely appreciate the reality check. Building a "faster version of llama.cpp" is not my goal, my goal is the orchestration layer, but I clearly tried to "Claude Code" my way through the infrastructure too fast.

I’m going to take this feedback, go back to the shed, and focus on the actual C++ implementation before I post another update. Also, big respect to Telosnex/fllama your are the benchmark for a reason, and I clearly have a lot of work to do to reach that level of "real."

Thanks for keeping the community honest.

advisedwang•5m ago
I can't tell if this is satire of a LLM response or an actual LLM response

Show HN: Masharif

https://github.com/alielmorsy/Masharif
1•alielmorsy19•4m ago•0 comments

Design docs are waterfall wearing a hoodie

https://www.lucasfcosta.com/blog/design-docs
1•lucasfcosta•7m ago•0 comments

Show HN: GreedyPhrase – 1.21x better compression than GPT-4o tiktoken, 6x faster

https://github.com/rayonnant-ai/greedyphrase
1•bazlightyear•7m ago•0 comments

Phison CEO: Consumer electronics firms may fail by 2026 over AI memory crisis

https://www.pcgamer.com/hardware/memory/many-consumer-electronics-manufacturers-will-go-bankrupt-...
1•jamesy0ung•7m ago•0 comments

Show HN: Spawn – Postgres migration/test build system with minijinja (not vibed)

https://github.com/saward/spawn
1•Winsaucerer•15m ago•0 comments

Practical Guide to Building Reliable AI Agents

https://docs.inkeep.com/guides/agent-engineering
1•gaurav12342345•19m ago•1 comments

The anxiety driving AI's brutal work culture is a warning for all of us

https://www.theguardian.com/technology/ng-interactive/2026/feb/17/ai-startups-work-culture-san-fr...
2•i7l•24m ago•1 comments

Did Gemini just give me someone's personal information?

https://old.reddit.com/r/GeminiAI/comments/1r7dn80/did_gemini_just_give_me_someones_personal/
1•virgildotcodes•24m ago•0 comments

Show HN: Instagram Saved Collection Exporter

https://chromewebstore.google.com/
1•qwikhost•24m ago•0 comments

Join the Python Security Response Team

https://pyfound.blogspot.com/2026/02/join-the-python-security-response-team.html
1•lumpa•25m ago•0 comments

Convert Audi to 432Hz

https://kaizoku.digital/tools/retune/index.html
1•musti_92•25m ago•0 comments

The Final Bottleneck

https://lucumr.pocoo.org/2026/2/13/the-final-bottleneck/
3•donutshop•25m ago•0 comments

Frederick Wiseman, 96, Penetrating Documentarian of Institutions, Dies

https://www.nytimes.com/2026/02/16/movies/frederick-wiseman-dead.html
1•mitchbob•25m ago•1 comments

Safe VSP

https://linusakesson.net/scene/safevsp/index.php
1•amichail•29m ago•0 comments

Tesla Robotaxis Reportedly Crashing at a Rate That's 4x Higher Than Humans

https://gizmodo.com/tesla-robotaxis-reportedly-crashing-at-a-rate-thats-4x-higher-than-humans-200...
22•tempestn•31m ago•7 comments

Open-source game engine Godot is drowning in 'AI slop' code contributions

https://www.pcgamer.com/software/platforms/open-source-game-engine-godot-is-drowning-in-ai-slop-c...
2•vinyl7•32m ago•0 comments

Why an A.I. Video of Tom Cruise Battling Brad Pitt Spooked Hollywood

https://www.nytimes.com/2026/02/16/movies/tom-cruise-brad-pitt-artificial-intelligence-seedance.html
3•goplayoutside•32m ago•0 comments

Ask HN: How do you overcome imposter syndrome?

4•fdneng•33m ago•0 comments

The most practical, fast, tiny command sandboxing for AI agents

https://dw1.io/blog/2026/02/17/sandboxec/
2•dwisiswant0•33m ago•0 comments

An assembler that compiles to a printf loop

https://git.sr.ht/~sebsite/printfasm
1•todsacerdoti•34m ago•0 comments

The mathematical mystery inside the shooter Quake 3

https://www.scientificamerican.com/article/the-mathematical-mystery-inside-the-legendary-90s-shoo...
1•emmelaich•34m ago•2 comments

Adam Mastroianni of Experimental History Interviews Gwern (2025)

https://gwern.net/interview-inkhaven
2•cainxinth•35m ago•0 comments

First Agent Skills Hackathon by the Authors of SkillsBench

https://www.skillathon.ai/
1•xdotli•35m ago•1 comments

Rathbun's Operator

https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/rathbuns-operator.html
18•bb88•36m ago•2 comments

How Jet Engines Are Powering Data Centers

https://www.wsj.com/business/energy-oil/how-jet-engines-are-powering-data-centers-b1c587a9
2•petethomas•36m ago•0 comments

PostCSS creator: How to make your open source project popular

https://evilmartians.com/chronicles/how-to-make-your-open-source-popular
1•ashtuchkin•37m ago•0 comments

The gut microbiota shapes the human and murine breath volatilome

https://www.cell.com/cell-metabolism/fulltext/S1550-4131(25)00544-3
1•PaulHoule•39m ago•0 comments

Show HN: Algorithms 1.0.0 – Minimal and clean implementations of algorithms

https://github.com/keon/algorithms
1•kwk236•39m ago•0 comments

The Cost of Staying vs Judgement, Surface Area and Compute

https://twitter.com/amytam01/status/2023593365401636896
1•walterbell•40m ago•0 comments

Write Specs, Not Chats

https://gist.github.com/breadchris/50928d8c6f279ac30959a6bb8b6bf3ca
1•breadchris•46m ago•1 comments