frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•1m ago•0 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•1m ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
1•rolph•4m ago•0 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•5m ago•0 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•7m ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•9m ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•9m ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•10m ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
2•rolph•11m ago•0 comments

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
2•hhs•14m ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•17m ago•0 comments

'AI' is a dick move, redux

https://www.baldurbjarnason.com/notes/2026/note-on-debating-llm-fans/
3•cratermoon•19m ago•0 comments

The source code was the moat. But not anymore

https://philipotoole.com/the-source-code-was-the-moat-no-longer/
1•otoolep•19m ago•0 comments

Does anyone else feel like their inbox has become their job?

1•cfata•19m ago•0 comments

An AI model that can read and diagnose a brain MRI in seconds

https://www.michiganmedicine.org/health-lab/ai-model-can-read-and-diagnose-brain-mri-seconds
2•hhs•22m ago•0 comments

Dev with 5 of experience switched to Rails, what should I be careful about?

1•vampiregrey•25m ago•0 comments

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

https://arxiv.org/abs/2601.16429
1•PaulHoule•26m ago•0 comments

Scientists discover “levitating” time crystals that you can hold in your hand

https://www.nyu.edu/about/news-publications/news/2026/february/scientists-discover--levitating--t...
2•hhs•28m ago•0 comments

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

https://www.youtube.com/watch?v=3VReIuv1GFo
1•erickhill•28m ago•0 comments

Tell HN: Yet Another Round of Zendesk Spam

2•Philpax•28m ago•0 comments

Postgres Message Queue (PGMQ)

https://github.com/pgmq/pgmq
1•Lwrless•32m ago•0 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
2•cui•35m ago•1 comments

NY lawmakers proposed statewide data center moratorium

https://www.niagara-gazette.com/news/local_news/ny-lawmakers-proposed-statewide-data-center-morat...
1•geox•36m ago•0 comments

OpenClaw AI chatbots are running amok – these scientists are listening in

https://www.nature.com/articles/d41586-026-00370-w
3•EA-3167•36m ago•0 comments

Show HN: AI agent forgets user preferences every session. This fixes it

https://www.pref0.com/
6•fliellerjulian•39m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model

https://github.com/ghostty-org/ghostty/pull/10559
2•DustinEchoes•41m ago•0 comments

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

https://github.com/sultanvaliyev/sshcode
1•sultanvaliyev•41m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/microsoft-appointed-a-quality-czar-he-has-no-direct-reports-and-no-b...
3•RickJWagner•42m ago•0 comments

Multi-agent coordination on Claude Code: 8 production pain points and patterns

https://gist.github.com/sigalovskinick/6cc1cef061f76b7edd198e0ebc863397
1•nikolasi•43m ago•0 comments

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

https://www.nytimes.com/2026/02/07/technology/washington-post-will-lewis.html
15•jbegley•44m ago•3 comments
Open in hackernews

ThalamusDB: Query text, tables, images, and audio

https://github.com/itrummer/thalamusdb
53•itrummer•4mo ago

Comments

tarwich•3mo ago
What a cool idea
itrummer•3mo ago
Thank you :-)
satisfice•3mo ago
How is it tested?
itrummer•3mo ago
We use mocking to replace actual LLM calls when testing for the correctness of the ThalamusDB code. In terms of performance benchmarking, we ran quite a few experiments measuring time, costs (fees for LLM calls), and result accuracy. The latter one is the hardest to evaluate since we need to compare the ThalamusDB results to the ground truth. Often, we used data sets from Kaggle that come with manual labels (e.g., camera trap pictures labeled with the animal species, then we can get ground truth for test queries that count the number of pictures showing specific animals).
satisfice•3mo ago
When someone claims that a system can search “approximately” or “semantically” that means there some sort of statistical behavior. There will be error. That error can be systematically characterized with enough data. But if it can’t or isn’t, then it’s a toy.

A problem I have with LLMs and the way they are marketed is that are being treated as and offered as if they were toys.

You’ve given a few tantalizing details, but what I would really admire is a link to full details about exactly what you did to collect sufficient evidence that this system can be trusted and in what ways it can be trusted.

itrummer•3mo ago
The approximation in ThalamusDB is relative to the best accuracy that can be achieved using the associated language models (LLMs). E.g., if ThalamusDB processes a subset of rows using LLMs, it can reason about possible results when applying LLMs to the remaining rows (taking into account all possible outcomes).

In general, when using LLMs, there are no formal guarantees on output quality anymore (but the same applies when using, e.g., human crowd workers for comparable tasks like image classification etc.).

Having said that, we did some experiments evaluating output accuracy for a prior version of ThalamusDB and the results are here: https://dl.acm.org/doi/pdf/10.1145/3654989 We will actually publish more results with the new version within the next few months as well. But, again, no formal guarantees.

satisfice•3mo ago
With humans we don’t need guarantees, because we have something called accountability and reputation. We also understand a lot about how and why humans make errors, and so human errors make sense to us.

But LLMs routinely make errors that if made by a human would cause us to believe that human is utterly incompetent, acting in bad faith, or dangerously delusional. So we should never just shrug and say nobody’s perfect. I have to be responsible for what my product does.

Thanks for the link!

AmazingTurtle•3mo ago
You say it's a DB, given the execution time of up to 600s per query, I say: its an agent.
itrummer•3mo ago
Well, it definitely goes beyond a traditional DBMS, but yes :-) If processing the same amount of data via pure SQL versus SQL with LLM calls, it will be slower and more expensive when using LLMs. Note that 600s is just the default timeout, though. It's typically much faster (and you can set the timeout to whatever you like; ThalamusDB will return the best result approximation it can find until the timeout). More details in the documentation: https://itrummer.github.io/thalamusdb/thalamusdb.html
petre•3mo ago
Seems like a good tool for police work.
ilaksh•3mo ago
Does this use CLIP or something to get embeddings for each image and normal text embeddings for the text fields, and then feed the top N results to a VLM (LLM) to select the best answer(s)?

What's the advantage of this over using llamaindex?

Although even asking that question I will be honest, the last thing I used llamaindex for, it seemed mostly everything had to be shoehorned in as using that library was a foregone conclusion, even though ChromaDB was doing just about all the work in the end because the built in test vector store that llamaindex has strangely bad performance with any scale.

I do like how simple the llamaindex DocumentStore or whatever is where you can just point it at a directory. But it seems when using a specific vectordb you often can't do that.

I guess the other thing people do is put everything in postgres. Do people use pgvector to store image embeddings?

bobosha•3mo ago
We use a vector db (Qdrant) to store embeddings of images and text and built a search UI atop it.
ilaksh•3mo ago
Cool. And the other person implies that the queries can search across all rows if necessary? For example if all images have people and the question is which images have the same people in them. Or are you talking about a different project?
itrummer•3mo ago
I think the previous post refers to a different project. But yes: ThalamusDB can process all rows if necessary, including matching all images that have the same persons in them.
itrummer•3mo ago
LlamaIndex relies heavily on RAG-style approaches, e.g., we're using items whose embedding vectors are close to the embedding vectors of the question (what you describe). RAG-style approaches work great if the answer depends only on a small part of the data, e.g., if the right answer can be extracted from a few top-N documents.

It's less applicable if the answer cannot be extracted from a small data subset. E.g., you want to count the number of pictures showing red cars in your database (rather than retrieving a few pictures of red cars). Or, let's say you want to tag beach holiday pictures with all the people who appear in them. That's another scenario where you cannot easily work with RAG. ThalamusDB supports such scenarios, e.g., you could use the query below in ThalamusDB:

SELECT H.pic FROM HolidayPictures H, ProfilePictures P as Tag WHERE NLFILTER(H.pic, 'this is a picture of the beach') AND NLJOIN(H.pic, P.pic, 'the same person appears in both pictures');

ThalamusDB handles scenarios where the LLM has to look at large data sets and uses a few techniques to make that more efficient. E.g., see here (https://arxiv.org/abs/2510.08489) for the implementation of the semantic join algorithm.

A few other things to consider:

1) ThalamusDB supports SQL with semantic operators. Lay users may prefer the natural language query interfaces offered by other frameworks. But people who are familiar with SQL might prefer writing SQL-style queries for maximum precision.

2) ThalamusDB offers various ways to restrict the per-query processing overheads, e.g., time and token limits. If the limit is reached, it actually returns a partial result (e.g., lower and upper bounds for query aggregates, subsets of result rows ...). Other frameworks do not return anything useful if query processing is interrupted before it's complete.

catlifeonmars•3mo ago
Dumb question: why is this its own DB vs being a Postgres extension (for example).
cyanydeez•3mo ago
Bizarre coding solutions that reqhire OPENAI
itrummer•3mo ago
:-) Actually, you can also use models of other providers (e.g., Google's Gemini models). You just have to set your access key by the corresponding provider and configure the models you'd like to use in this file: https://github.com/itrummer/thalamusdb/blob/main/config/mode...