frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

LLVM: The Bad Parts

https://www.npopov.com/2026/01/11/LLVM-The-bad-parts.html
75•vitaut•1h ago•7 comments

Floppy disks turn out to be the greatest TV remote for kids

https://blog.smartere.dk/2026/01/floppy-disks-the-best-tv-remote-for-kids/
169•mchro•2h ago•91 comments

Date is out, Temporal is in

https://piccalil.li/blog/date-is-out-and-temporal-is-in/
15•alexanderameye•46m ago•7 comments

The struggle of resizing windows on macOS Tahoe

https://noheger.at/blog/2026/01/11/the-struggle-of-resizing-windows-on-macos-tahoe/
2259•happosai•19h ago•952 comments

Reproducing DeepSeek's MHC: When Residual Connections Explode

https://taylorkolasinski.com/notes/mhc-reproduction/
45•taykolasinski•2h ago•12 comments

2025 marked a record-breaking year for Apple services

https://www.apple.com/newsroom/2026/01/2025-marked-a-record-breaking-year-for-apple-services/
23•soheilpro•1h ago•14 comments

Launch a Debugging Terminal into GitHub Actions

https://blog.gripdev.xyz/2026/01/10/actions-terminal-on-failure-for-debugging/
71•martinpeck•3h ago•12 comments

Lightpanda migrate DOM implementation to Zig

https://lightpanda.io/blog/posts/migrating-our-dom-to-zig
135•gearnode•6h ago•69 comments

Windows 8 Desktop Environment for Linux

https://github.com/er-bharat/Win8DE
120•edent•2h ago•107 comments

The Manchester Garbage Collector and purple-garden's runtime

https://xnacly.me/posts/2026/manchester-garbage-collector/
9•xnacly•4d ago•0 comments

Apple picks Google's Gemini to power Siri

https://www.cnbc.com/2026/01/12/apple-google-ai-siri-gemini.html
56•stygiansonic•44m ago•30 comments

Ai, Japanese chimpanzee who counted and painted dies at 49

https://www.bbc.com/news/articles/cj9r3zl2ywyo
96•reconnecting•6h ago•32 comments

Show HN: 30k IKEA items in flat text

https://huggingface.co/datasets/tsazan/ikea-us-commercetxt
41•tsazan•5d ago•31 comments

CLI agents make self-hosting on a home server easier and fun

https://fulghum.io/self-hosting
669•websku•18h ago•451 comments

JRR Tolkien reads from The Hobbit for 30 Minutes (1952)

https://www.openculture.com/2026/01/j-r-r-tolkien-reads-from-the-hobbit-for-30-minutes-1952.html
214•bookofjoe•5d ago•73 comments

Personal thoughts/notes from working on Zootopia 2

https://blog.yiningkarlli.com/2025/12/zootopia-2.html
112•pantalaimon•5d ago•3 comments

Ireland fast tracks Bill to criminalise harmful voice or image misuse

https://www.irishtimes.com/ireland/2026/01/07/call-to-fast-track-bill-targeting-ai-deepfakes-and-...
57•mooreds•2h ago•32 comments

39c3: In-house electronics manufacturing from scratch: How hard can it be? [video]

https://media.ccc.de/v/39c3-in-house-electronics-manufacturing-from-scratch-how-hard-can-it-be
198•fried-gluttony•3d ago•91 comments

Ozempic reduced grocery spending by an average of 5.3% in the US

https://news.cornell.edu/stories/2025/12/ozempic-changing-foods-americans-buy
215•giuliomagnifico•3h ago•312 comments

iCloud Photos Downloader

https://github.com/icloud-photos-downloader/icloud_photos_downloader
565•reconnecting•20h ago•220 comments

This game is a single 13 KiB file that runs on Windows, Linux and in the Browser

https://iczelia.net/posts/snake-polyglot/
264•snoofydude•17h ago•68 comments

Zen-C: Write like a high-level language, run like C

https://github.com/z-libs/Zen-C
59•simonpure•3h ago•46 comments

Keychron's Nape Pro turns your keyboard into a laptop‑style trackball rig

https://www.yankodesign.com/2026/01/08/keychrons-nape-pro-turns-your-mechanical-keyboard-into-a-l...
31•tortilla•1h ago•10 comments

Conbini Wars – Map of Japanese convenience store ratios

https://conbini.kikkia.dev/
104•zdw•5d ago•42 comments

XMPP and Metadata

https://blog.mathieui.net/xmpp-and-metadata.html
51•todsacerdoti•5d ago•14 comments

I'm making a game engine based on dynamic signed distance fields (SDFs) [video]

https://www.youtube.com/watch?v=il-TXbn5iMA
403•imagiro•4d ago•60 comments

The next two years of software engineering

https://addyosmani.com/blog/next-two-years/
243•napolux•18h ago•249 comments

Uncrossy

https://uncrossy.com/
147•dgacmu•14h ago•41 comments

FUSE is All You Need – Giving agents access to anything via filesystems

https://jakobemmerling.de/posts/fuse-is-all-you-need/
192•jakobem•18h ago•63 comments

Perfectly Replicating Coca Cola [video]

https://www.youtube.com/watch?v=TDkH3EbWTYc
288•HansVanEijsden•3d ago•188 comments
Open in hackernews

Show HN: 30k IKEA items in flat text

https://huggingface.co/datasets/tsazan/ikea-us-commercetxt
41•tsazan•5d ago
OP here.

I took the unofficial IKEA US dataset (originally scraped by jeffreyszhou) and converted all 30,511 products into a flat, markdown-like protocol called CommerceTXT.

The goal: See if a flatter structure is more efficient for LLM context windows.

The results: - Size: 30k products across 632 categories. - Efficiency: The text version uses ~24% fewer tokens (3.6M saved total) compared to the equivalent minified JSON. - Structure: Files are organized in folders (e.g. /products/category/), which helps with testing hierarchical retrieval routers.

The link goes to the dataset on Hugging Face which has the full benchmarks.

Parser code is here: https://github.com/commercetxt/commercetxt

Happy to answer questions about the conversion logic!

Comments

colinbartlett•2h ago
Any practical use for this IKEA data specifically?

Or just a handy open data set you could use to prove out the concept?

DennisP•2h ago
I assumed it's because IKEA is famous for flat packing its furniture.
tsazan•2h ago
Exactly! IKEA removes the air from the box to save space, CommerceTXT removes the HTML/JSON bloat to save tokens. You made my day!
embedding-shape•1h ago
> IKEA removes the air from the box to save space

Huh? I don't think that's true, there usually is some sort of structural elements inside of the package, meant to be thrown away (usually made with cardboard/paper), and all Ikea boxes definitively have lots of air inside of them, not sure what would make you say otherwise, unless it's some joke I'm missing?

jayknight•1h ago
A box that contained a fully assembled kitchen table would contain a lot more air. I think that comment just meant IKEA designs items that can be packaged into a minimal volume.
embedding-shape•1h ago
Ah yes, on second reading it's actually pretty obvious that is what parent meant and I was reading it too literally. Thanks for the clarification, that's certainly correct :)
WildGreenLeave•2h ago
I've had the idea to setup an AI that automatically (re)designs a room using IKEA stuff. It would definitely help me decorate my room in a better way.
tsazan•1h ago
That`s great use case. If you ship it, let me know!
vachina•2h ago
There’s already a schema.org spec that defines a JSON-LD structured data that you can embed on every of your product page to provide a machine readable interface of your product.

For example, Google’s indexers already use this to surface pricing data. https://developers.google.com/search/docs/appearance/structu...

tsazan•1h ago
That`s is valid for search engines. But if JSON-LD was sufficient for agents, Google wouldn't have launched UCP (Universal Commerce Protocol) yesterday.
vachina•29m ago
Took a look, UCP looks like presenting an entire shopping lifecycle for agents.

JSON-LD is just read-only metadata for machines.

tsazan•15m ago
True. But extracting that metadata requires parsing the full DOM. CommerceTXT is for efficient discovery. Scan inventory cheaply first, then commit to the transaction.
sognetic•2h ago
Interesting! So did you do any experiments on a relevant subset of the data to test whether LLM performance degrades by introducing a new, presumably unknown to the LLM, format?
tsazan•1h ago
The 24% token savings come from converting JSON syntax to CommerceTXT.
reddalo•1h ago
I don't understand why new proposed standards are still polluting the root namespace (also see llms.txt).

These things should be put under /.well-known [1], not in the root.

[1] https://en.wikipedia.org/wiki/Well-known_URI

dkdcio•1h ago
I was not aware you shouldn’t do that — what’s the rationale/historical context?
embedding-shape•1h ago
Like most standards: "Because it's a standard". Kind of like setting a .body for a GET request, you can kind of do that, but why not do it the way it's intended to instead? Use POST :)
gunalx•49m ago
I have seen post being used instead of get, because of having encrypted parameters by default.
embedding-shape•42m ago
Yeah, and also because of firewalls sometimes stripping body of GET requests (not responses mind you, we're talking requests) to a server, and also because it's really uncommon to put a body on a GET request ;)
ljm•3m ago
Sending a URL encoded form or some JSON in a POST request is also easier for most people to understand than the myriad ways you might format a query string in the URL (which may have a stricter limit on size).

You only have to look at how different services handle arrays in query strings to understand that serialising it is conceptually easier.

Comes up a lot in search or filter APIs. I'm sure there was some effort many moons ago to create a QUERY method for that.

buildbuildbuild•1h ago
User friendliness. I’ve seen several less-technical people able to quickly access, create, and understand “llms.txt”.

It’s not ideal but representative of the tension between user experience and technical correctness.

JosephRedfern•1h ago
I've heard that LLMs can perform worse with these more efficient representations compared to e.g. JSON, because they've seen far fewer examples of them during training. Do you know how true that is?
TechSquidTV•1h ago
Absolutely, but usually when working with a bespoke format for optimization, it's paired with an LLM specifically trained on that format.
tsazan•1h ago
You are right about cryptic formats. CommerceTXT is semantically structured. Models like GPT, Claude and Gemini understand it out-of-the-box via ICL.
btrettel•1h ago
Interesting. I had been thinking recently about grep-friendly structured text file formats given the constraints of regex. But I hadn't considered that you could design a structured text file format to be LLM-friendly given token constraints.
tsazan•1h ago
You're right.If a format is easy to grep, it is almost always cheap to tokenize. We treat token density as a primary design constraint.
usefulposter•1h ago
"OP here" is the funniest tell that shows up when using an LLM to write a post for HN or Reddit.

It's funny because it makes zero sense in the body of an initial post!

In comments replying to people downthread - maybe. But opening a top-level post with "Original Poster here" is just silly and shows a lack of respect for community etiquette.

https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...

dkoy•1h ago
Good catch, think you’re on to something
croisillon•1h ago
years ago i did a small tool that, when you entered a product number, would scan all IKEA-websites with currency Euro and return the prices for each of them ; not that i expected furniture tourism to become a thing but it was funny
tsazan•27m ago
Reminds me of a friend who built a comment sentiment analyzer years ago. At the time, it looked like great innovation...
bleonard•3m ago
A blast from the past. When Taskrabbit was acquired by IKEA, I built several tools that went through the whole catalog via various crawling approaches. One tool was to estimate how long it would be to put each item together for an initial training set.