frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

A 13-month-old LlamaIndex bug re-embeds unchanged content

https://sebastiantirelli.com/writing/llamaindex-embedding-churn/
1•tirelli•1h ago

Comments

tirelli•1h ago
Author here. Quick map of the finding for anyone skimming:

Bug 1 is in the hashing path. Node.hash, TextNode.hash, and IngestionCache all include metadata via MetadataMode.ALL, which ignores excluded_embed_metadata_keys. Any volatile field (mtime, atime, file size) flips the hash and forces a re-embed of byte-identical content.

Bug 2 is that default_file_metadata_func queries POSIX-only stat keys (mtime, atime, created). Whether a given fsspec backend emits those keys decides whether Bug 1 is firing on you today. I source-inspected every backend under the fsspec GitHub org and every built-in in filesystem_spec.

Active today (bug fires at day-level precision): local, gcsfs, sshfs + built-in sftp, smb, arrow/HDFS, memory.

Masked today (bug dormant, waiting on Bug 2 getting fixed): s3fs, adlfs, ossfs, swiftspec, tosfs, gdrive-fsspec, dropboxdrivefs, ipfsspec, opendalfs, dbfs, http, webhdfs, ftp, github, gist, git.

Wrapper: alluxiofs delegates to its wrapped backend.

GCS is the outlier on the active side because gcsfs/core.py explicitly sets result["mtime"] = parse(object_metadata["updated"]) as a legacy compatibility alias. There is a TODO about removing it. The code is still there.

Once default_file_metadata_func gets its natural one-line fix to use fs.modified(path) instead of POSIX-specific keys, every masked backend activates at sub-second precision simultaneously.

Reproducers at github.com/stirelli/llamaindex-embedding-churn (five progressively real levels, level 3 uses real OpenAI API with billed tokens). Fix is PR #21462 against run-llama/llama_index, three lines plus a regression test covering both directions.

Happy to answer questions on the benchmark, the fsspec inspection, or the cost math.

MiniZinc, constraint modelling language solve discrete optimisation problems

https://www.minizinc.org
1•Alifatisk•41s ago•0 comments

FluxBB Built with Rust

https://github.com/skorotkiewicz/fluxbb-rs
2•modinfo•3m ago•0 comments

'Startup Cowboys' Are Making This Texas Town the New Tech Hotspot

https://www.wsj.com/business/entrepreneurship/lockhart-texas-tech-hub-fd1bf380
1•malshe•4m ago•1 comments

Collaborative Autoresearch for Any Repo

https://community.computer/
1•aiw1nt3rs•4m ago•0 comments

Before Apple Music, There Was MapleMusic–Canada's Forgotten Pioneer

https://thewalrus.ca/before-apple-music-there-was-maplemusic/
1•janandonly•4m ago•0 comments

QR Lume – a privacy‑first iOS tool for inspecting QR codes safely

https://apps.apple.com/us/app/qrlume/id6762032298
1•briwandt•5m ago•2 comments

Wsl9x: Windows 9x Subsystem for Linux

https://codeberg.org/hails/wsl9x
1•birdculture•6m ago•1 comments

Mercedes-Benz and Liquid AI Partner to Scale Embedded In-Car Intelligence

https://www.businesswire.com/news/home/20260423009970/en/Mercedes-Benz-and-Liquid-AI-Partner-to-S...
1•mnewme•7m ago•0 comments

Multiview Stereo Projection [video]

https://www.youtube.com/watch?v=YbxsYhTjYFI
2•Saig6•12m ago•0 comments

Google investing up to $40B in Anthropic

https://www.wsj.com/finance/investing/google-expands-anthropic-investment-with-40-billion-commitm...
1•chang1•13m ago•0 comments

The Nintendo Switch Switch (2019)

https://blog.cynthia.re/post/nintendo-switch-ethernet-switch
1•zdw•14m ago•0 comments

Benchmarking OpenAI's Privacy Filter

https://www.tonic.ai/blog/benchmarking-openai-privacy-filter-pii-detection
2•akamor•15m ago•0 comments

SFO Quiet Airport (2025)

https://viewfromthewing.com/san-francisco-airport-removed-90-minutes-of-daily-noise-travelers-say...
4•CaliforniaKarl•17m ago•0 comments

Multiservice Impact for Azure Workloads in East US

https://azure.status.microsoft/en-us/status
2•tapoxi•19m ago•1 comments

QLMarkdown: macOS Quick Look extension for viewing Markdown files

https://github.com/sbarex/QLMarkdown
1•janandonly•19m ago•0 comments

Voice analysis pipeline that detects emotional incongruence

https://app.myyangu.com/
1•xthemadgenius•20m ago•0 comments

Ivanpah Solar Power Facility

https://en.wikipedia.org/wiki/Ivanpah_Solar_Power_Facility
1•simonebrunozzi•20m ago•0 comments

Video recordings of software engineering pioneers, SD&m Bonn 2001

https://archive.org/details/sdm_software_ionieers
2•kkroesch•21m ago•3 comments

Mine, a Coalton and Common Lisp IDE

https://coalton-lang.github.io/20260424-mine/
4•Jach•23m ago•0 comments

OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API

https://developers.openai.com/api/docs/changelog
5•arabicalories•24m ago•0 comments

Benchmarking How Postgres Scales

https://www.dbos.dev/blog/benchmarking-workflow-execution-scalability-on-postgres
2•KraftyOne•28m ago•0 comments

It's OK To Be Scared (Don't be in a rush to get screwed)

https://chillphysicsenjoyer.substack.com/p/its-ok-to-be-scared
2•crescit_eundo•29m ago•0 comments

Ask HN: How would you improve this CLI tool for finding terminal commands?

https://github.com/stvkoch/Command-Finder
2•stvkoch•31m ago•0 comments

Ubuntu 26.04 LTS

https://documentation.ubuntu.com/release-notes/26.04/changes-since-previous-interim/
4•maxloh•33m ago•0 comments

LLM research on Hacker News is drying up

https://dylancastillo.co/til/llm-research-on-hacker-news-is-dying.html
3•dcastm•35m ago•0 comments

What happened to Omegle? rise and fall of internet's favorite stranger danger

https://mashable.com/article/what-happened-to-omegle
1•rolph•36m ago•0 comments

Tech bros: it's time to challenge Silicon Valley's saviour complex

https://www.theguardian.com/commentisfree/picture/2026/apr/25/tech-bros-its-time-to-challenge-sil...
2•robtherobber•37m ago•0 comments

There Will Be a Scientific Theory of Deep Learning

https://arxiv.org/abs/2604.21691
7•jamie-simon•41m ago•0 comments

Kubuntu Linux 26.04 LTS (Resolute Raccoon)

https://kubuntu.org/news/kubuntu-26-04-release-notes/
7•jrepinc•42m ago•1 comments

Kubernetes v1.36: User Namespaces in Kubernetes are finally GA

https://kubernetes.io/blog/2026/04/23/kubernetes-v1-36-userns-ga/
2•soheilpro•42m ago•0 comments