frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Peacock. A New Programming Language

1•hashhooshy•4m ago•1 comments

A postcard arrived: 'If you're reading this I'm dead, and I really liked you'

https://www.washingtonpost.com/lifestyle/2026/02/07/postcard-death-teacher-glickman/
2•bookofjoe•5m ago•1 comments

What to know about the software selloff

https://www.morningstar.com/markets/what-know-about-software-stock-selloff
2•RickJWagner•9m ago•0 comments

Show HN: Syntux – generative UI for websites, not agents

https://www.getsyntux.com/
3•Goose78•10m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/ab75cef97954
2•birdculture•10m ago•0 comments

AI overlay that reads anything on your screen (invisible to screen capture)

https://lowlighter.app/
1•andylytic•11m ago•1 comments

Show HN: Seafloor, be up and running with OpenClaw in 20 seconds

https://seafloor.bot/
1•k0mplex•12m ago•0 comments

Tesla turbine-inspired structure generates electricity using compressed air

https://techxplore.com/news/2026-01-tesla-turbine-generates-electricity-compressed.html
2•PaulHoule•13m ago•0 comments

State Department deleting 17 years of tweets (2009-2025); preservation needed

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•sleazylice•13m ago•1 comments

Learning to code, or building side projects with AI help, this one's for you

https://codeslick.dev/learn
1•vitorlourenco•14m ago•0 comments

Effulgence RPG Engine [video]

https://www.youtube.com/watch?v=xFQOUe9S7dU
1•msuniverse2026•15m ago•0 comments

Five disciplines discovered the same math independently – none of them knew

https://freethemath.org
3•energyscholar•16m ago•1 comments

We Scanned an AI Assistant for Security Issues: 12,465 Vulnerabilities

https://codeslick.dev/blog/openclaw-security-audit
1•vitorlourenco•17m ago•0 comments

Amazon no longer defend cloud customers against video patent infringement claims

https://ipfray.com/amazon-no-longer-defends-cloud-customers-against-video-patent-infringement-cla...
2•ffworld•17m ago•0 comments

Show HN: Medinilla – an OCPP compliant .NET back end (partially done)

https://github.com/eliodecolli/Medinilla
2•rhcm•20m ago•0 comments

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6157066
1•dkga•21m ago•1 comments

Resistance Infrastructure

https://www.profgalloway.com/resistance-infrastructure/
2•samizdis•25m ago•1 comments

Fire-juggling unicyclist caught performing on crossing

https://news.sky.com/story/fire-juggling-unicyclist-caught-performing-on-crossing-13504459
1•austinallegro•25m ago•0 comments

Restoring a lost 1981 Unix roguelike (protoHack) and preserving Hack 1.0.3

https://github.com/Critlist/protoHack
2•Critlist•27m ago•0 comments

GPS and Time Dilation – Special and General Relativity

https://philosophersview.com/gps-and-time-dilation/
1•mistyvales•30m ago•0 comments

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

https://github.com/writerslogic/witnessd
1•davidcondrey•31m ago•1 comments

Show HN: I built a clawdbot that texts like your crush

https://14.israelfirew.co
2•IsruAlpha•32m ago•2 comments

Scientists reverse Alzheimer's in mice and restore memory (2025)

https://www.sciencedaily.com/releases/2025/12/251224032354.htm
2•walterbell•36m ago•0 comments

Compiling Prolog to Forth [pdf]

https://vfxforth.com/flag/jfar/vol4/no4/article4.pdf
1•todsacerdoti•37m ago•0 comments

Show HN: Cymatica – an experimental, meditative audiovisual app

https://apps.apple.com/us/app/cymatica-sounds-visualizer/id6748863721
1•_august•38m ago•0 comments

GitBlack: Tracing America's Foundation

https://gitblack.vercel.app/
9•martialg•38m ago•1 comments

Horizon-LM: A RAM-Centric Architecture for LLM Training

https://arxiv.org/abs/2602.04816
1•chrsw•39m ago•0 comments

We just ordered shawarma and fries from Cursor [video]

https://www.youtube.com/shorts/WALQOiugbWc
1•jeffreyjin•40m ago•1 comments

Correctio

https://rhetoric.byu.edu/Figures/C/correctio.htm
1•grantpitt•40m ago•0 comments

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist
2•crescit_eundo•44m ago•0 comments
Open in hackernews

Find 'Abbey Road when type 'Beatles abbey rd': Fuzzy/Semantic search in Postgres

https://rendiment.io/postgresql/2026/01/21/pgtrgm-pgvector-music.html
83•nethalo•2w ago

Comments

lbrito•1w ago
I was just starting to learn about embeddings for a very similar use on my project. Newbie question: what are pros/cons of using an API like gpt Ada to calculate the embeddings, compared to importing some model on Python and running it locally like in this article?
alright2565•1w ago
Do you want it to run on your CPU, or someone else's GPU?

Is the local model's quality sufficient for your use case, or do you need something higher quality?

storystarling•1w ago
The main trade-off I found is the RAM footprint on your backend workers. If you run the model locally, every Celery worker needs to load it into memory, so you end up needing much larger instances just to handle the overhead.

With Ada your workers stay lightweight. For a bootstrapped project, I found it easier to pay the small API cost than to manage the infrastructure complexity of fat worker nodes.

gingerlime•1w ago
Great post. Explains the concepts just enough that they click without going too deep, shows practical implementation examples, how it fits together. Simple, clear and ultimately useful. (to me at least)
cess11•1w ago
I found fuzzy search in Manticore to be straightforward and pretty good. Might be a decent alternative if one perceives the ceremony in TFA as a bit much.
pinkmuffinere•1w ago
The rewritten title is confusing imo. Can I propose:

“Finding ‘Abbey Road’ given ‘beatles abbey rd’ search with Postgres”

pinkmuffinere•1w ago
(The missing close-apostrophe, and the use of “type” are what really confuse me in the original submission)
fsckboy•1w ago
these days i find myself yearning to type "Beatles abbey rd" and find only "Beatles abbey rd"
Manfred•1w ago
Especially with small datasets it’s more important to be exact at the expense of a user having to fix a typo.
storystarling•1w ago
I learned this the hard way on a book platform I'm working on. While semantic search is useful for discovery, we found that prioritizing exact matches is critical. It seems users get pretty frustrated if they type a specific title and get a list of conceptually similar results instead of the actual book. We ended up having to tune the ranking to heavily favor literal string matches over the vector distance to keep people from bouncing.
fsckboy•1w ago
everything you are saying rings perfectly true to me but there's an additional problem I encounter. (i'm going to make up my example because i'm lazy to check but you'll get the idea) say you want to look up "Alexander the Great"...

...God help you if Brad Pitt and or the Jonas Brothers ever played a role with exactly that name-match. The web and search (and the culture?) have become super biased toward video especially commercial offerings, and the sorting ranked by popularity means pages and pages of virtually identical content about that which you are not interested in.

digiown•1w ago
Related but I wish Wikipedia would provide a filter against movies, music, pop culture related topics. They take up a huge amount of the namespace for things for whatever reason and often directs me to unintended pages.
cess11•1w ago
https://en.wikipedia.org/w/index.php?search=melancholia+-mov...
drsalt•1w ago
why did you have to learn this the hard way?
storystarling•1w ago
complaining customers...
qingcharles•1w ago
I remember eBay 30 years ago when it would showed you whatever you typed in. Compared to 2026 where it only shows you everything except the thing you typed in.
TeamDman•1w ago
for 50,000 rows I'd much rather just use fzf/nucleo/tv against json files instead of dealing with database schemas. When it comes to dealing with embedding vectors rather than plaintext then it gets slightly more annoying but still feels like such an pain in the ass to go full database when really it could still be a bunch of flat open files.

More of a perspective from just trying to index crap on my own machine vs building a SaaS

danielfalbo•1w ago
> Abbey Road

> The Dark Side of the Moon

> OK Computer

Those are my 3 personal records ever. I feel so average now...

tialaramex•1w ago
The other two are popular but "Dark Side of the Moon" in particular was extremely popular. Like, top 10 albums ever level popular.
esafak•1w ago
tl,dr: A demo of pg_trgm (fuzzy matcher) + pgvector (vector search).
TurdF3rguson•1w ago
Sounds nice but I'm not sure that trigram brings anything to the table that vector didn't already bring.
timlod•1w ago
FWIW, the performance considerations section is a little simplistic, and probably assumes that exact dataset/problem.

For GIN for example, perfomance depends a lot on the size of the search input (the fewer characters, the more rows to compare) as well as the number of rows/size of the index.

It also mentions GiST (another type of index which isn't mentioned anywhere else in the article)..

augusteo•1w ago
On the API vs local model question:

We went with API embeddings for a similar use case. The cold-start latency of local models across multiple workers ate more money in compute than just paying per-token. Plus you avoid the operational overhead of model updates.

The hybrid approach in this article is smart. Fuzzy matching catches 80% of cases instantly, embeddings handle the rest. No need to run expensive vector search on every query.

TurdF3rguson•1w ago
Those text embeddings are dirt cheap. You can do around 1M titles on the cloudflare embedding model I used last time without exceeding daily free tier.
augusteo•1w ago
yeah exactly. even OpenAI/Gemini are really cheap too
exogen•1w ago
This could also be applied to record linkage. With search, there will usually be multiple results, and there's always a "top" match even if its confidence/score is quite low. In record linkage, at least if you're automating it, you need to minimize false positives and only automatically link records if confidence is super high that they're a match – and that doesn't just mean the top scoring match has high confidence, but that there's also no 2nd best match with a good score. If that's not the case, leave the records for manual human review.

My experience here is also related to music. Here are some cases to think about:

What's the actual title of the song "Mambo #5" vs. how you might search for it or find it referenced in other records? Mambo #5? Mambo No. 5? Mambo No. Five? Mambo Number 5? Mambo Number Five? And that's not even getting to the fact that the actual title is actually longer, with a parenthetical. This is a case where bigrams, trigrams, or other string similarly metrics wouldn't perform very well. Same with the Beatles song, is it "Dr. Robert" or "Doctor Robert"? Most string similarly algorithms put "Dr" and "Doctor" pretty far apart, but with vectors they should be practically equivalent.

How about "You've Lost that Loving Feeling"? Aren't there some dropped Gs in those gerunds? Is it You've Lost That Lovin' Feeling? You've Lost That Lovin' Feelin'? You've Lost That Loving Feelin'? In this case, string similarity (including trigrams) perform very well.

How about songs with censored titles? Some records will certainly have profanity censored, but would it be like "F*ck", "F**k", "F@$k", or what? And is the censorship actually part of the canonical song title, or just some references to it?

In the "#5" and "Dr." cases, this could be solved pretty effectively by the normalization step described in the article (hardcoding what #, No., and Dr. expand to) – although even that can get pretty complicated: what do you do about numbers? Do you normalize every numerical reference, e.g. "10 Thousand", to digits, or words? What about rarely used abbreviations, or cases where an abbreviation is ambiguous and could mean different things in different contexts? If someone has a song called "PT Cruiser" are you gonna accidentally normalize that to "Part Cruiser"? For this reason, I like to see this not as a "normalization" step, where there's a single normalized form, but rather a "query expansion" step – generate all the possible permutations, and those are your actual comparison strings.

It seems like embeddings could do the job of automatically considering different spellings/abbreviations of words as equivalent. I'm just a casual observer here, but I'm sure this is also a well-explored topic in speech-to-text, since you have to convert someone's utterances to match actual entity names, like movie titles for example.