frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•55s ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
1•o8vm•2m ago•0 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•3m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•16m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•19m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
1•helloplanets•22m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•29m ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•31m ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•32m ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•33m ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
1•basilikum•35m ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•36m ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•41m ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
3•throwaw12•42m ago•1 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•42m ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•43m ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•45m ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•48m ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•51m ago•1 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
2•mgh2•57m ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•59m ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•1h ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•1h ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•1h ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•1h ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•1h ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
2•birdculture•1h ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•1h ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
2•ramenbytes•1h ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•1h ago•0 comments
Open in hackernews

LLMZip: Lossless Text Compression Using Large Language Models

https://arxiv.org/abs/2306.04050
2•jfantl•3mo ago

Comments

hamsic•3mo ago
"Lossless" does not mean that the LLM can accurately reconstruct human-written sentences. Rather, it means that the LLM generates a fully reproducible bitstream based on its own predicted probability distribution.

Reconstructing human-written sentences accurately is impossible because it requires modeling the "true source"—the human brain state (memory, emotion, etc.)—rather than the LLM itself.

Instead, a practical approach is to reconstruct the LLM output itself based on seeds or to store it in a compressible probabilistic structure.

DoctorOetker•3mo ago
Its unclear what you claim lossless compression does or doesn't do, especially since you tie in storing an RNG's seed value at the end of your comment.

"LLMZip: Lossless Text Compression Using Large Language Models"

Implies they use the LLM's next token probability distribution to bring the most likely ones up for the likelihood sorted list of tokens (the higher the next token from the input stream -generated by humans or not- the fewer bits needed to encode its position starting the count from top to bottom, so the better the LLM can predict the true probability of the next token, the better it will be able to compress human-generated text in general)

Do you deny LLM's can be used this way for lossless compression?

Such a system can accurately reconstruct the uncompressed original input text (say generated by a human) from its compressed form.

hamsic•3mo ago
Sure, a model-based coder can losslessly compress any token stream. I just meant that for human-written text, the model’s prediction diverges from how the text was actually produced — so the compression is formally lossless, but not semantically faithful or efficient.
DoctorOetker•3mo ago
This is from 2023 (not a complaint, just observing that the result might be stale and even lower upper bounds may have been achieved).

Its quite curious to consider the connection between compression and intelligence. It's hard to quantify comprehension, i.e. how do you see if a system effectively comprehends some data? Lossless compression rates are very attractive, since the task is to not lose data but squeeze it as close as possible to its information content.

It does raise other questions though: which corpus is considered representative? A base model without finetuning might be more vulgar but also more effective at compressing the comparatively vulgar corpus. The corpus the corpus expressed by an RLHF/whatever reinforced and pretty-prompted chatbot however will be very good at compressing its own outputs but less good at compressing the actual vile human corpus, although both the base model and the aligned model will be relatively good at compressing each others output as well, they will each excel at compressing their own implicit corpus.

Another question: as the bits/per character upper bound falls monotonically it will suffer diminishing returns. How does one square that with the proposal that lossless compression corresponds to intelligence? It would clearly not be a linear correspondence, and it suggests that one would need exponentially larger and larger corpus to beat the prior compression rates.

How long can it write before repeating itself?

====

It also raises lots of societal questions: less than 1 bit per character, how many characters in library genesis / anna's archive etc?