Show HN: Semantic search over the National Gallery of Art

https://nga.demo.mixedbread.com/

145•breadislove•3mo ago

Comments

philipkglass•3mo ago

How does this work? I thought it was probably powered by embeddings and maybe some more traditional search code, but I checked out the linked github repo and I didn't see any model/inference code. The public code is a wrapper that communicates with your commercial API?

Some searches work like magic and others seem to veer off target a lot. For example, "sculpture" and "watercolor" worked just about how I'd expect. "Lamb" showed lambs and sheep. But "otter" showed a random selection of animals.

breadislove•3mo ago

It is powered by Mixedbread Search which is powered by our model Omni. Omni is multimodal (text, video, audio, images) and multi vector, which helps us to capture more information.

The search is in beta and we improving the model. Thank you for reporting the queries which are not working well.

Edit: Re the otter, I just checked and I did not found otters in the dataset. We should not return any results if the model is not sure to reduce confusion.

justincormack•3mo ago

neither "blue pictures" nor "multiples" worked well.

breadislove•3mo ago

thank you for reporting these. we will improve on them for the next iteration.

reportrappor•3mo ago

I'll pile on since these are useful. Searching for "fingers and holes" did find me some nice hand drawings, but the real gold at the national gallery to me is the Bruce Nauman. The nga.gov search knew what I wanted.

philipkglass•3mo ago

There's at least a little bit of otter in the data. The one relevant result I saw was "Plate 40: Two Otters and a Beaver" by Joris Hoefnagel.

I also expected semantic search to return similar results for "fireworks" and "pyrotechnics," since the latter is a less common synonym for the former. But I got many results for fireworks and just one result for pyrotechnics.

This is still impressive. My impulse is to poke at it with harder cases to try to reason about how it could be implemented. Thanks for your Show HN and for replying to me!

breadislove•3mo ago

If you find more such cases please feel free to send them over to aamir at domain name of the Show HN. I would love to see those cases and see how we can improve on them. Thank you so much for the feedback.

treetalker•3mo ago

Yeah, "naked chicks" returns women with no clothes instead of baby birds.

yawnxyz•3mo ago

hey, your service is back up again!!! Mixedbread was my favorite tool for so long since your pivot, and I'm so glad y'all are back

breadislove•3mo ago

We have a lot more things coming up soon. It just took us some time building Mixedbread Search.

nmitchko•3mo ago

In case anyone wants to do this themselves, check out the pipeline here: https://github.com/isc-nmitchko/iris-document-search

Colnomic and nvidia models are great for embedding images and MUVERA can transform those to 1D vectors.

losteric•3mo ago

> check out the pipeline here

“the pipeline” - seems like this is just a personal hackathon project?

Why these models vs other multimodals? Which “nvidia models”?

dfc•3mo ago

It would be nice if took you to the NGA page about the item. I cant even copy the text easily for easy search.

"Images of german shepherds" never fails to provide some humor.

breadislove•3mo ago

Thank you for pointing this out. We will add this tomorrow morning.

dfc•3mo ago

The results for "Mark Rothko", "Paintings by Mark Rothko", "Paintings similar to mark rothko" etc does not bring up anything that I was expecting. NGA has a large collection of Rothko paintings but none of them come up.

This NGA link returns over a thousand pieces by Rothko: https://www.nga.gov/artists/1839-mark-rothko/artworks

breadislove•3mo ago

We are right now not including the artist name. Which will be done in the next iteration of the model (next week). Right now the search is only based on what the model can "see". And it seems like that the model does not understand the art of Mark Rothko.

The next version can see the image and read the metadata.

A bit more context: We are include everything in the latent space (embeddings) without trying to maintain multiple indexes and hack around things. There is still a huge mountain to climb. But this one seems really promising.

4ndrewl•3mo ago

And this seems like a hard limitation of this approach as art (v craft) is concerned with interpretation and reception whereas this is more like unsplash-for-galleries in that the searches have to be very literal I guess? (eg search for something abstract, like 'dreams', something that you will find depicted in the collection, produces quite the mixed bag of results).

iDon•3mo ago

A search for : "character studies of old farmers" yielded good results. The results are drawings / engravings, which may reflect the balance of the collection, and perhaps this subject is more used in practice than in marketable oil paintings.

Since this is a semantic search, using a vector embedding, it will handle meanings better than a text search, which would handle names better.

Computer0•3mo ago

This is neat, not sure how to report queries that are working poorly as you have mentioned. But when I search "Waltz" I am presented with Kitchen Utensils and only one piece of dancing folks. Presumably this is due to the Artist's name being 'Walton'.

breadislove•3mo ago

We will add a feedback form tomorrow morning. For now please feel free to write to aamir at domain name of the page. thank you so much! this helps us a lot.

khaki54•3mo ago

Tried "Images of german shepherds" and not one on the page of 16

pogilvie•3mo ago

I built a toy version of something like this a couple-ish years ago for a hackathon. I wrote up a blog of how I did it back then for anyone interested: https://www.patrickogilvie.com/engineering/Image_Search_Engi...

Would be interesting to know how relevant that approach is now.

ulrikhansen54•3mo ago

Congrats on the launch guys. I remember meeting ya'll in SF. What happened to your HF model/project?

breadislove•3mo ago

there is a lot coming

kvsrh•3mo ago

Is it possible to add other data sources?

breadislove•3mo ago

yes, in which one would be interested?

samdg•3mo ago

I love old stereograms, and was happy to find a couple using this tool!

adamontherun•3mo ago

love that a search for 'chill vibes sculpture' returned a very chill set of results. nice step change in art search capabilities

khaki54•3mo ago

Yale has an amazing one, worth looking at: https://lux.collections.yale.edu/

ted_dunning•3mo ago

Is that a multi-modal search? Or just textual matching?

I couldn't find any examples that couldn't be explained by simple text matches.

ted_dunning•3mo ago

Works really well for some artist names (rembrandt, whistler) and exceedingly poorly for others (john singer sargent).

joki77•3mo ago

Ketika kode dan kanvas bertemu — sebuah pencarian tak sekadar kata, tapi rasa. Di antara lukisan dan batang piksel, mesin mencoba memahami jawaban yang tak terucap.

kburman•3mo ago

I recently learned that semantic search embeddings mostly represent topics and concepts, but they don’t handle negation or emotion very well.

For example, if you search for “paintings of winter landscapes but without sun and trees,” you’ll still get results with trees. That’s because embeddings capture the presence of concepts like “tree” or “landscape,” but not logical relationships like “without” or “not.”

Similarly, embeddings aren’t great at capturing how something feels. They can tell that “sad poem” and “happy poem” are different mainly because of the words used, not because they truly understand emotional tone.

This happens because most embedding models (like OpenAI’s or sentence-transformers) are trained to group things by semantic similarity, not logical meaning or sentiment. Negation, polarity, and affect aren’t explicitly represented in the vector space.

Might be common knowledge to some, but it was a cool TIL moment for me, realizing that embeddings are great at what something is about, but not how it feels or what it excludes.

breadislove•3mo ago

Thats actually not correct. Embeddings can handle relationships like “without” or “not.” when trained for it. You need to scale up the training massively to make it generalize it well. The current version of Mixedbread Search supports negatives like "tshirt without stripes". You can check it out on our launch video [1]. We are working on a way more generalized model, which should be able to capture relationships, emotions and much more. The current models are just limited.

[1]: https://www.mixedbread.com/blog/mixedbread-search

kburman•3mo ago

I was referring specifically to popular embedding models like OpenAI’s and sentence-transformers, which (as far as I know) don’t reliably handle negation or emotional nuance, they mostly capture topical similarity.

I don’t know enough of the underlying math to say for sure whether embeddings can be trained to consistently represent negation, but when I tried the Mixedbread demo myself with a query like “winter landscapes without sun and trees”, it still showed me paintings with both sun and trees. So at least in its current form, it doesn’t seem to fully handle those semantic relationships yet.

Trojanking•3mo ago

I created a similar website called https://artifair.com, where users can download high-quality artwork.

Andi•3mo ago

It always gives me exactly 16 (or less) images. So this cannot be very reliable because there are more results. Proof: Add a space anwhere to your search string in the form and hit enter again, then some new results are mixed in.

Show HN: Maravel-Framework 10.61 prevents circular dependency

The age of a treacherous, falling dollar

Ask HN: AI Generated Diagrams

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Show HN: A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Show HN: Maravel-Framework 10.61 prevents circular dependency

The age of a treacherous, falling dollar

Ask HN: AI Generated Diagrams

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Show HN: A delightful Mac app to vibe code beautiful iOS apps

Show HN: Gemini Station – A local Chrome extension to organize AI chats

Welfare states build financial markets through social policy design

Market orientation and national homicide rates

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

Show HN: Pyrig – One command to set up a production-ready Python project

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Show HN: Semantic search over the National Gallery of Art

Comments