I'm kind of disappointed in this article, Splade is a cool way to improve results of a TF/IDF index with minimally invasive changes and this obscures that more than it clarifies.
> Next, my SPLADE implementation in Elasticsearch is oversimplified. If you scroll back up to get_splade_embedding, we extract non-zero elements from vec_np (the SPLADE tokens) but discard their associated weights. This is a missed opportunity. The SPLADE papers use these weights for scoring matches.
Yes, exactly, that is the whole point of Splade.
Probably easier to demonstrate if you drop down a level to Lucene, I don't think you will be able to do it easily with Elastic.
Tangentially, I haven't looked closely at SPLATE which tries to marry Splade and ColBERT, but it's an interesting idea. https://arxiv.org/html/2404.13950v1
JnBrymn•3h ago
You're absolutely right. This was a post I tossed together quickly just to see what could be done without thinking too much. In retrospect, I think this would be better implemented using Elasticsearch sparse vector fields which allow you to specify the value of every token. Maybe I'l make an update post to try again.
jbellis•3h ago
> Next, my SPLADE implementation in Elasticsearch is oversimplified. If you scroll back up to get_splade_embedding, we extract non-zero elements from vec_np (the SPLADE tokens) but discard their associated weights. This is a missed opportunity. The SPLADE papers use these weights for scoring matches.
Yes, exactly, that is the whole point of Splade.
Probably easier to demonstrate if you drop down a level to Lucene, I don't think you will be able to do it easily with Elastic.
Tangentially, I haven't looked closely at SPLATE which tries to marry Splade and ColBERT, but it's an interesting idea. https://arxiv.org/html/2404.13950v1
JnBrymn•3h ago