Is there actually any hard data out there comparing the NPU on the Google Tensor G4 vs the Apple A18? I wasn't able to quickly find anything concrete.
I mean Apple has been shipping mobile NPUs for longer than Google (Apple: since A11 in 2017, Google: since 2021), and are built on (ostensibly) a smaller silicon node that Google's (G4: Samsung SF4P vs A18: TSMC N3E). However, the G4 appears to have more RAM bandwidth (68.26 GB/s vs 60 GB/s on A18).
https://blog.google/products/pixel/pixel-visual-core-image-p...
In fact there’s no clear indication when Apple Intelligence is running on-device or in their Private Cloud Compute.
I am telling you this because I read between the lines that you believe current technology is a reason for you to be hopeful. Sure, it should be. But never forget, your child can do much more then you as a sighted person will ever be able to understand. Don't let them drown in your own misery. Let them discover what they can do. You will be surprised what they come up with. And dont fall for Gear Acquision Syndrome. Sure, tools are nice, and they do get better, which is also nice. I LOVE vision models, to stay on topic somehow. However, I still leave my house with only a cane and my phone in my pocket. I do occasionally ask Siri "Where am I" to get an address if I happen to have forgotten where I am exactly, currently. But at the end of the day, my cane is what shows me the way. Most tech is hype, plain old hearing and your sense of touch gets you much farther then you might think.
Wish you all the best for your own journey, and the development of your child.
BUT NOW... THE FUTURE IS HERE.... an all-knowing god-like cell phone can tell these poor miserable individuals what the objects in their own homes are! No more tragic Mr. Magoo-ian accidents!
But thank you for posting this; It certainly enlightened me! I'll admit, all these AI solutions
They got to him.
"There's a tree. There's a tree. There's a tree. There's a number of pedestrians. There's a tree. There's a sign." does not strike me as useful feedback for getting around.
"Pavement. Row of stores to the left. Joe's Grocery Store. Doors. Door handle. A shelf with bakery. A shelf with canned goods. A shelf with bottles. Coke bottle. Large Pepsi bottle. Apple juice bottle. Passageway. Checkout. Payment terminal. Door. Door handle. Pavement. ..."
The only truly useful bits I see in your stream of text is, again, "Cream of Mushroom" vs "Cream of Chicken." I am actively holding something, so I know where it is, but need to differentiate it from printed detail.
I'd like to make a private Qwen or similar for my kids to prompt with a button and voice control. It doesn't need vision... Although eventually that'd be very cool.
Siri just sucks.
We might not be there yet...
However, if you're looking for instruction following (like an agent), I've tried to implement my own agent and have lost faith. Even GPT-4.1 will regularly gaslight me that no, it definitely ran the tool call to add the event to my calendar, when it just didn't. I can't get any more adherence out of it.
opened issue for them to confirm this: https://github.com/apple/ml-fastvlm/issues/7
raster graphics and videos are likely not included in build
probably some unused code (libraries) got into it, it can grow quite large
or maybe some ML models?
Especially if the API gives opportunity for app developers to load their custom LoRa fine-tunings onto OS-standard foundation models at runtime, then you can (ideally) have the best of both worlds -- fine-tuned app-specific models with reasonable app sizes.
Tesseract is awful for handwriting.
cool to see people doing stuff with smaller models
BryanLegend•9mo ago
efnx•9mo ago
static_void•9mo ago