frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Bridging the gap between keyword and semantic search with SPLADE (2024)

http://arcturus-labs.com/blog/2024/10/09/bridging-the-gap-between-keyword-and-semantic-search-with-splade/
23•softwaredoug•8mo ago

Comments

jbellis•8mo ago
I'm kind of disappointed in this article, Splade is a cool way to improve results of a TF/IDF index with minimally invasive changes and this obscures that more than it clarifies.

> Next, my SPLADE implementation in Elasticsearch is oversimplified. If you scroll back up to get_splade_embedding, we extract non-zero elements from vec_np (the SPLADE tokens) but discard their associated weights. This is a missed opportunity. The SPLADE papers use these weights for scoring matches.

Yes, exactly, that is the whole point of Splade.

Probably easier to demonstrate if you drop down a level to Lucene, I don't think you will be able to do it easily with Elastic.

Tangentially, I haven't looked closely at SPLATE which tries to marry Splade and ColBERT, but it's an interesting idea. https://arxiv.org/html/2404.13950v1

JnBrymn•8mo ago
You're absolutely right. This was a post I tossed together quickly just to see what could be done without thinking too much. In retrospect, I think this would be better implemented using Elasticsearch sparse vector fields which allow you to specify the value of every token. Maybe I'l make an update post to try again.

OpenBSD-current now runs as guest under Apple Hypervisor

https://www.undeadly.org/cgi?action=article;sid=20260115203619
125•gpi•2h ago•7 comments

The Myth of the ThinkPad

https://innovintageblog.wordpress.com/2026/01/08/the-myth-of-the-thinkpad/
22•volemo•1h ago•21 comments

Apple is fighting for TSMC capacity as Nvidia takes center stage

https://www.culpium.com/p/exclusiveapple-is-fighting-for-tsmc
625•speckx•14h ago•376 comments

Pocket TTS: A high quality TTS that gives your CPU a voice

https://kyutai.org/blog/2026-01-13-pocket-tts
296•pain_perdu•1d ago•57 comments

Briar keeps Iran connected via Bluetooth and Wi-Fi when the internet goes dark

https://briarproject.org/manual/fa/
223•us321•10h ago•97 comments

Inside The Internet Archive's Infrastructure

https://hackernoon.com/the-long-now-of-the-web-inside-the-internet-archives-fight-against-forgetting
302•dvrp•1d ago•72 comments

Everything you need to know about act() in React tests

https://howtotestfrontend.com/resources/react-act-function-everything-you-need-to-know
11•howToTestFE•4d ago•0 comments

Boeing knew of flaw in part linked to UPS plane crash, NTSB report says

https://www.bbc.com/news/articles/cly56w0p9e1o
65•1659447091•1h ago•26 comments

Linux boxes via SSH: suspended when disconected

https://shellbox.dev/
152•messh•9h ago•95 comments

Photos capture the breathtaking scale of China's wind and solar buildout

https://e360.yale.edu/digest/china-renewable-photo-essay
611•mrtksn•20h ago•461 comments

Ask HN: How can we solve the loneliness epidemic?

495•publicdebates•13h ago•801 comments

My Gripes with Prolog

https://buttondown.com/hillelwayne/archive/my-gripes-with-prolog/
58•azhenley•5h ago•39 comments

JuiceFS is a distributed POSIX file system built on top of Redis and S3

https://github.com/juicedata/juicefs
133•tosh•11h ago•71 comments

All 23-Bit Still Lifes Are Glider Constructible

https://mvr.github.io/posts/xs23.html
37•HeliumHydride•5h ago•5 comments

Data is the only moat

https://frontierai.substack.com/p/data-is-your-only-moat
114•cgwu•11h ago•26 comments

Claude is good at assembling blocks, but still falls apart at creating them

https://www.approachwithalacrity.com/claude-ne/
208•bblcla•1d ago•152 comments

Show HN: Gambit, an open-source agent harness for building reliable AI agents

https://github.com/bolt-foundry/gambit
62•randall•5h ago•12 comments

Go-legacy-winxp: Compile Golang 1.24 code for Windows XP

https://github.com/syncguy/go-legacy-winxp/tree/winxp-compat
95•Oxodao•3d ago•37 comments

Show HN: OpenWork – An open-source alternative to Claude Cowork

https://github.com/different-ai/openwork
162•ben_talent•2d ago•31 comments

CVEs affecting the Svelte ecosystem

https://svelte.dev/blog/cves-affecting-the-svelte-ecosystem
148•tobr•12h ago•26 comments

First impressions of Claude Cowork

https://simonw.substack.com/p/first-impressions-of-claude-cowork
170•stosssik•1d ago•96 comments

Show HN: Reversing YouTube’s “Most Replayed” Graph

https://priyavr.at/blog/reversing-most-replayed/
27•prvt•3h ago•6 comments

The five orders of ignorance (2000)

https://cacm.acm.org/opinion/the-five-orders-of-ignorance/
17•svilen_dobrev•3d ago•5 comments

SETI Home Flags 100 Signals After Sorting 12B Others

https://news.berkeley.edu/2026/01/12/for-21-years-enthusiasts-used-their-home-computers-to-search...
63•TMEHpodcast•3h ago•19 comments

I Built a 1 Petabyte Server from Scratch [video]

https://www.youtube.com/watch?v=vVI7atoAeoo
32•zdw•5d ago•3 comments

What a Programmer Does (1967) [pdf]

http://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/k-9-pdf/k-9-u2769-1-B...
39•nz•5d ago•6 comments

Why senior engineers let bad projects fail

https://lalitm.com/post/why-senior-engineers-let-bad-projects-fail/
180•SupremumLimit•7h ago•122 comments

Supply Chain Vuln Compromised Core AWS GitHub Repos & Threatened the AWS Console

https://www.wiz.io/blog/wiz-research-codebreach-vulnerability-aws-codebuild
107•uvuv•12h ago•22 comments

Use of Bayesian methodology in clinical trials of drug and biological products [pdf]

https://www.fda.gov/media/190505/download
55•brendanashworth•22h ago•17 comments

Found: Medieval Cargo Ship – Largest Vessel of Its Kind Ever

https://www.smithsonianmag.com/smart-news/archaeologists-say-theyve-unearthed-a-massive-medieval-...
140•bookofjoe•14h ago•35 comments