Got really excited then realized I couldn’t figure out what “Google AI Edge” actually _is_.
Edit: I think it’s largely a rebrand of this from a couple years ago: https://developers.googleblog.com/en/introducing-mediapipe-s...
Go to this page using your mobile phone.
I am apparently a doormat or a seatbelt.
It seems to be a rebranded failure. At Google you get promoted for product launches because of the OKRs system and more rarely for maintenance.
The full list [1] doesn't seem to include a human. You can tweak the score threshold to reduce false positives.
1: https://storage.googleapis.com/mediapipe-tasks/image_classif...
Did you also try on items from the list ?
If there is a match (and this is not frequent), to me it's still very low confidence (like noise or luck).
It seems to be a repacking of https://blog.tensorflow.org/2020/03/higher-accuracy-on-visio...
So an old release from 5 years ago (like very long time in AI-world), and AFAIK it has been superseded by YOLO-NAS and other models. MediaPipe feels really old tool, except for some specific subtasks like face tracking.
And as a side-note, the OKR-system at Google is a very serious thing, there are lot of people internally gaming the system, and that could explain why it is a "new" launch, instead of a rather disappointing rebrand of the 2020-version.
I'd rather recommend building on more modern tools, such as: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM-256M-Ins... (runs on iPhone with < 1GB of memory)
So you came here to offer a knee-jerk assessment of an AI runtime and blamed the failure on OKRs. Then somebody points out that your use-case isn't covered by the model, and you're looping back around to the OKR topic again. To assess an AI inference tool.
Why would you even bother hitting reply on this post if you don't want to talk about the actual topic being discussed? "Agile bad" is not a constructive or novel comment.
CoreML is specific to the Apple ecosystem and lets you convert a PyTorch model to a CoreML .mlmodel that will run with acceleration on iOS/Mac.
Google Mediapipe is a giant C++ library for running ML flows on any device (iOS/Android/Web). It includes Tensorflow Lite (now LiteRT) but is also a graph processor that helps with common ML preprocessing tasks like image resizing, annotating, etc.
Google killing products early is a good meme but Mediapipe is open source so you can at least credit them with that. https://github.com/google-ai-edge/mediapipe
I used a fork of Mediapipe for a contract iOS/Android computer vision product and it was very complex but worked well. A cross-platform solution would not have been possible with CoreML.
Terrifying what being an iOS dev does to a feller.
Also where the f is Swift Assist already
https://huggingface.co/google/gemma-3n-E4B-it-litert-preview...
It's pretty impressive that this runs on-device. It's better than a lot of commercial mocap offerings.
AND this was marked deprecated/unsupported over 3 years ago despite the fact it's a pretty mature solution.
Google has been sleeping on their tech or not evangelizing it enough.
A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
For context, I get to choose the tech stack for a greenfield project. I think that executor h, which belongs to the pytorch ecosystem, will have a way more predictable future than anything Google does, so I currently consider executorch more.
That said, I probably wouldn't use this unless mine was one of the specific use cases supported[0]. I have no idea how hard it would be to add a new model supporting arbitrary inputs and outputs.
For running inference cross-device I have used Onnx, which is low-level enough to support whatever weights I need. For a good number of tasks you can also use transformers.js which wraps onnx and handles things like decoding (unless you really enjoy implementing beam search on your own). I believe an equivalent link to the above would be [1] which is just much more comprehensive.
But it was garbage. It barely parsed the question, didn't even attempt to answer it, and replied in what was barely serviceable English. All I asked was how it was small enough to run locally on my phone. It was bad enough for me to abandon the model entirely, which is saying a lot, because I feel like I have pretty low expectations for AI work in the first place.
Bit off-topic, but did you expect to see a real or honest answer about itself? I see many people under the impression that models know information about themselves that isn't in the system prompt. Couldn't be further from the truth. In face, those questions specifically lead to hallucinations more often resulting in an overconfident assertion with a "reasonable" answer.
The information the model knows (offline - no tools allowed) stops weeks if not months if not years prior to when the model is done training. There is _zero_ information about its inception, how it works, or anything similar in its weights.
Sorry, this is mostly directed at the masses - not you.
This days probably better to stick with onnxruntime via hugging face transformers or transformers.js library or wait until executorch mature. I haven't seen any SOTA model officially released having official port to tensorflow lite / liteRT for a long time: SAM2, EfficientSAM, EdgeSAM, DFINE, DEIM, Whisper, Lite-Whisper, Kokoro, DepthAnythingV2 - everything is pytorch by default but with still big communities for ONNX and MLX
davedx•1d ago
(It seems to be open source: https://github.com/google-ai-edge/mediapipe)
I think this is a unified way of deploying AI models that actually run on-device ("edge"). I guess a sort of "JavaScript of AI stacks"? I wonder who the target audience is for this technology?
wongarsu•1d ago
For stuff like face tracking it's still useful, but for some other tasks like image recognition the world has changed drastically
babl-yc•1d ago
LLMs and computer vision tasks are good examples of this.
For example, a hand-gesture recognizer might require: - Pre-processing of input image to certain color space + image size - Copy of image to GPU memory - Run of object detection TFLite model to detect hand - Resize of output image - Run of gesture recognition TFLite model to detect gesture - Post processing of gesture output to something useful
Shipping this to iOS+Android requires a lot of code beyond executing TFLite models.
The Google Mediapipe approach is to package this graph pipeline, and shared processing "nodes" into a single C++ library where you can pick and choose what you need and re-use operations across tasks. The library also compiles cross-platform and the supporting tasks can offer GPU acceleration options.
One internal debate Google likely had was whether it was best to extend TFLite runtime with these features, or to build a separate library (Mediapipe). TFLite already supports custom compile options with additional operations.
My guess is they thought it was best to keep TFLite focused on "tensor based computation" tasks and offload broader operations like LLM and image processing into a separate library.