How can AI ID a cat?

https://www.quantamagazine.org/how-can-ai-id-a-cat-an-illustrated-guide-20250430/

108•sonabinu•3d ago

Comments

isopede•6h ago

Neat. Anyone know what is used to make the animations? I like the graphic design!

cwmoore•5h ago

Small but effective visual cues, smooth and carefully chromatic.

I am struck by the conceptual framework of classification tasks so snappily rendering clear categories from such fuzziness.

chacham15•5h ago

Lottie: https://lottiefiles.com/

bc569a80a344f9c•5h ago

An interesting follow-up is using various xAI (explainable AI) techniques to then investigate what features in an image the classifier uses to make its decisions. Saliency maps work great for images. When I was playing around with it, the binary classifier I trained from scratch to distinguish cats from dogs ended up basically only looking at eyes. Enough images in the dataset featured cats with visible, open eyes, and the vertical slit is an excellent predictor. It was an interesting lesson that also emphasized how much the training data matters.

cco•5h ago

ExAI feels like a better shortening, both for clarity and given that xAI is a company already.

bc569a80a344f9c•5h ago

The term certainly predates the company.

pests•5h ago

First I’ve heard of it.

bc569a80a344f9c•5h ago

The term comes from a paper, “An explainable artificial intelligence system for small-unit tactical behavior” by Lent al from 2004

https://cdn.aaai.org/IAAI/2004/IAAI04-019.pdf

It has 490 citations.

DARPA has a whole program named after it: https://www.darpa.mil/research/programs/explainable-artifici...

ledauphin•4h ago

notably, the paper uses the capitalization XAI.

andrewflnr•5h ago

I've heard of explainability for years, but I don't think I've specifically seen the term "xAI" in relation to it.

krackers•2h ago

This article seemed really basic, no insight other than "it learns the high dimensional manifold on which cat images lie, thus separating cats from non-cats" (not that simple explanations are bad, but Quanta articles seem to be getting more watered down over time).

The real question is whether we can get some insight as to how exactly it's able to do this. For convolution neural networks it turns out that you can isolate and study the behavior of individual circuits and try to understand what "traditional image processing" function they perform, and that gives some decent intuition: https://distill.pub/2020/circuits/ - CNNs become less mysterious when you break them down as being decomposed into "edge detectors, curve detectors, shape classifiers, etc."

For LLMs it's a bit harder, but anthropic did some research in this vein.

busymom0•5h ago

Probably one of the first articles on this topic which I have read to the finish line and understood everything fully. Thanks.

StrandedKitty•5h ago

For some reason I thought this article would explain how to ID a specific cat, that is basically facial recognition for cats.

Is this even something that's possible with current tech? Like, surely cats have some facial features that can be used to uniquely identify them? It would be cool to have a global database of all cats that users would be able to match their photos against. Imagine taking a picture of a cat you see on the street, and it immediately tells you the owner's details and whether it's missing.

tanelpoder•5h ago

I wrote the CatBench vector search playground toy app exactly for this reason! [1] ("cat-similarity search for recommendation engines and cat-fraud detection"). I built it both for learning & fun, but also it's useful for demoing vector search functionality, plugged in to regular RDBMS application schemas in business context. I used cats & dogs as it's something everyone understands, instead of diving deep into some narrow industry vertical specific use case.

[1]: https://tanelpoder.com/posts/catbench-vector-search-query-th...

dhosek•3h ago

I imagine when they run out of other sensors to add to our phones, they’ll add chip readers so you can just scan for the implanted microchip on a cat you encounter. (said semi-sarcastically since the tech requires close proximity between animal and reader which most cats you encounter on the street will not countenance)

ch4s3•2h ago

> which most cats you encounter on the street will not countenance

Maybe not with you ;)

joshvm•36m ago

Yes, I've worked in this space for dogs (for re-identifying animals that have been vaccinated for rabies). It's a very difficult problem, but mostly because getting/scraping good training data is difficult. You really want lots of paired images of the same animal and that's hard compared to searching for "cat". Plus the usual challenges: animals don't like to stay still so getting good pictures is hard and users must have good guidance for lighting/pose to get the best results. Human facial recognition benefits from strong commercial interest and the most robust methods rely on extras like 3D scanning.

Tricks include facial alignment + cropping and very strong constraints on orientation to make sure you have a good frontal image (apps will give users photo alignment markers). Otherwise it's a standard visual seatch. Run a face extraction model to get the crop, warp to standard key points, compute the crop embedding, store in a database and do a nearest neighbour lookup.

There are a few startups doing this. Also look at PetFace which was a benchmark released a year or so ago. Not a huge amount of work in this area compared to humans, but it's of interest to people like cattle farmers as well.

https://github.com/mapooon/PetFace

Veliladon•5h ago

I have a Finnish Lapphund dog and from the right angle AI thinks it's a cat.

bdcravens•4h ago

I have six animals, and Apple Photos does a great job of recognizing them by name after I labeled them the first time (the office dog as well). Two of them however are gray tabbies (brothers) and it can't distinguish them, so I had to name them with an ampersand ("Harley & Ralph Lauren")

Impressed that it can do as well as it does, I just find that amusing.

mshockwave•4h ago

Came to say Apple also did a great job on tagging my bois who are both grey-ish cats, even in pictures they faced backward, no idea how they did that

dhosek•3h ago

What I found impressive was that Apple Photos, given pictures of my cousins when they were 50 or more years old, was able to identify pictures of them as kids. On the other hand, it could never consistently distinguish between my two older brothers (although to be fair, they were identical twins). It also insists that a beagle I once owned was a cat. I mean, sure, he sometimes slept on his back with his paws in the air like a cat, but he was all dog.

javchz•2h ago

The same with Google photos, it groups similar cats as just one. Fun fact does the same for human twins

cmpalmer52•3h ago

Just an anecdote, but back in college, I had an algorithms professor who gave us a classifier problem like the square and triangle boundary problem. His English was poor and nobody understood the problem as he stated it. I got an okay score on it, but never understood it very well.

Anyway, it’s 40 years later and I just read this article and said, “Oh! Now I get it.” A little too late, for Dr. Hippe’s class.

spacecadet•2h ago

Many years ago one of our cats got out, she was gone for 3 weeks, we tracked her down using 6 game cameras. Long story short, I have 200,000 images of "wild life"... Last year I used a VLM to catalog all of the images by generating detailed descriptions. I was able to find images of our cat in 3 searches, the same images we used to identify her originally, which took hours each day combing through thousands of images.

BobbyTables2•2h ago

Wasn’t it “Hitchhikers Guide to the Galaxy” that humorously described an AI controlled train system failing because it was looking at the clock instead of the trains?

Seems extremely prescient…

trjordan•1h ago

One of the funny things about LLMs and modern AI is that "the ability to recognize a cat" isn't a trained behavior anymore, as described here. It's an emergent property of training it to predict a lot of things, and cats happens to be present enough in the data such that they're one of the things you can ask a larger model and have it work.

My favorite work on digging into the models to explain this is Golden Gate Claude [0]. Basically, the folks at Anthropic went digging into the many-level, many-parameter model and found the neurons associated with the Golden Gate Bridge. Dialing it up to 11 made Claude bring up the bridge in response to literally everything.

I'm super curious to see how much of this "intuitive" model of neural networks can be backed out effectively, and what that does to how we use it.

[0] https://www.anthropic.com/news/golden-gate-claude

reilly3000•48m ago

Long have I wanted a cat door that would only open for my cats, not the mean neighborhood one that eats their food. I can’t be the only one. I’ve been meaning to try to build one with a camera, rPi and Google Coral, but never got around to it. There’s the matter of the locking mechanism and more.

dehrmann•37m ago

Take a look at SureFlap and OnlyCat. They use RFID chips in the cats.

AGI is an engineering problem, not a model training problem

The cost of interrupted work (2023)

Show HN: How to Build a Coding Agent (free workshop)

I built a tiny mac app to monitor and manage my development processes

Line scan camera image processing for train photography

How can AI ID a cat?

A 2k-year-old sun hat worn by a Roman soldier in Egypt

What makes Claude Code so damn good

Static sites with Python, uv, Caddy, and Docker

Evaluating LLMs for my personal use case

Physics of badminton's new killer spin serve

Librebox: An open source, Roblox-compatible game engine

Taking a look at my old Palm IIIx – by Paul Lefebvre

Acronis True Image costs performance when not used

RFC 9839 and Bad Unicode

Texas Instruments’ new plants where Apple will make iPhone chips

Motion (YC W20) Is Hiring Principal Software Engineers

Why was Apache Kafka created?

Debdelta

DeepCode: Open Agentic Coding

Romhack.ing's Internet Archive Mirror No Longer Available

Hacker and physicist – a tale of "common sense"

Not so prompt: Prompt optimization as model selection (2024)

The Cornervery: A 90-Degree Stapler

Libre – An anonymous social experiment without likes, followers, or ads

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

Monoid-Augmented FIFOs, Deamortised

Optimizing our way through Metroid

Exploring EXIF (2023)

A simple way to generate random points on a sphere

How can AI ID a cat?

Comments

AGI is an engineering problem, not a model training problem

The cost of interrupted work (2023)

Show HN: How to Build a Coding Agent (free workshop)

I built a tiny mac app to monitor and manage my development processes

Line scan camera image processing for train photography

How can AI ID a cat?

A 2k-year-old sun hat worn by a Roman soldier in Egypt

What makes Claude Code so damn good

Static sites with Python, uv, Caddy, and Docker

Evaluating LLMs for my personal use case

Physics of badminton's new killer spin serve

Librebox: An open source, Roblox-compatible game engine

Taking a look at my old Palm IIIx – by Paul Lefebvre

Acronis True Image costs performance when not used

RFC 9839 and Bad Unicode

Texas Instruments’ new plants where Apple will make iPhone chips

Motion (YC W20) Is Hiring Principal Software Engineers

Why was Apache Kafka created?

Debdelta

DeepCode: Open Agentic Coding

Romhack.ing's Internet Archive Mirror No Longer Available

Hacker and physicist – a tale of "common sense"

Not so prompt: Prompt optimization as model selection (2024)

The Cornervery: A 90-Degree Stapler

Libre – An anonymous social experiment without likes, followers, or ads

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

Monoid-Augmented FIFOs, Deamortised

Optimizing our way through Metroid

Exploring EXIF (2023)

A simple way to generate random points on a sphere