frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

GLM 5.2 beats Claude in our benchmarks

https://semgrep.dev/blog/2026/we-have-mythos-at-home-glm-52-beats-claude-in-our-cyber-benchmarks/
255•jms703•4h ago•99 comments

I used Claude Code to get a second opinion on my MRI

https://antoine.fi/mri-analysis-using-claude-code-opus
282•engmarketer•6h ago•387 comments

5k menus from the New York Public Library’s Buttolph Collection (1880-1920)

https://pudding.cool/2026/06/menu-story/
299•xbryanx•8h ago•80 comments

Historical memory prices 1960-2026

https://dam.stanford.edu/memory-prices.html
95•vga1•4h ago•29 comments

TOP500 at ISC’26: We have a New Number 1 Supercomputer

https://chipsandcheese.com/p/top500-at-isc26-we-have-a-new-number
45•rbanffy•3h ago•28 comments

Professor denounces mass AI fraud on an exam at Brown

https://english.elpais.com/education/2026-06-28/ai-fraud-at-brown-university-academic-integrity-i...
107•geox•6h ago•136 comments

Librepods: AirPods liberated

https://github.com/librepods-org/librepods
209•rbanffy•4h ago•61 comments

The Boeing 747 begins its final descent

https://www.theatlantic.com/magazine/2026/07/boeing-747-retirement/687304/
112•dbl000•3d ago•125 comments

Working around dragons with the Lemote Yeeloong laptop and OpenBSD

http://oldvcr.blogspot.com/2026/06/working-around-dragons-with-lemote.html
80•zdw•5h ago•17 comments

Knowledge Distillation of Black-Box Large Language Models

https://arxiv.org/abs/2401.07013
4•babelfish•17m ago•0 comments

Show HN: Zanagrams

https://zanagrams.com/
136•pompomsheep•7h ago•45 comments

Do LLMs pass the mirror test?

https://blog.pascalschuster.de/article/do-llms-pass-the-mirror-test
27•thepasch•3h ago•15 comments

Tokenmaxxing is dead, long live tokenmaxxing

https://12gramsofcarbon.com/p/agentics-tech-things-tokenmaxxing
90•theahura•6h ago•114 comments

Daisugi, the Japanese technique of growing trees out of other trees (2020)

https://www.openculture.com/2020/10/daisugi.html
90•MaysonL•6h ago•32 comments

Show HN: Bash4LLM+ – A lightweight, dependency-free Bash wrapper for LLM APIs

https://github.com/kamaludu/bash4llm/
18•kamaludu•3h ago•11 comments

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

https://github.com/JustVugg/nanoeuler
27•vforno•3h ago•7 comments

Show HN: DRM-Free Books

https://frequal.com/Perspectives/DrmFreeAuthors.html
50•TeaVMFan•5h ago•24 comments

Examining circuit boards from the Space Shuttle's I/O Processor

https://www.righto.com/2026/06/space-shuttle-io-processor-boards.html
73•pwg•6h ago•14 comments

Staying Awake (2008)

https://harpers.org/archive/2008/02/staying-awake/
10•NaOH•3d ago•3 comments

A way to exclude sensitive files issue still open for OpenAI Codex

https://github.com/openai/codex/issues/2847
168•pikseladam•10h ago•113 comments

Computer-Aided Language Development in Nonspeaking Children (1968) [pdf]

https://archive.org/details/colby1968-computer-aided-language-development-in-non-speaking-children
16•dang•3h ago•1 comments

The curious case of the disappearing Polish S (2015)

https://aresluna.org/the-curious-case-of-the-disappearing-polish-s/
195•colinprince•10h ago•64 comments

The MUMPS 76 Primer – anniversary edition

https://github.com/rochus-keller/MUMPS/blob/main/docs/MUMPS_Primer.adoc
62•Rochus•10h ago•31 comments

The US Used to Demand the Best Tech. Now We Ban It

https://www.pcmag.com/opinions/the-us-used-to-demand-the-best-tech-now-we-ban-it
101•mwexler•3h ago•70 comments

British Origami: the 1955 exhibition by Akira Yoshizawa

https://www.britishorigami.org/cp-lister-list/the-1955-exhibition-by-akira-yoshizawa/
11•dang•3h ago•1 comments

The KIDS Act would require age checks to get online

https://www.eff.org/deeplinks/2026/06/kids-act-would-require-age-checks-get-online
244•bilsbie•10h ago•220 comments

More evidence is consistent with possible ancient life on Mars (2025)

https://www.cbc.ca/radio/quirks/more-evidence-of-life-on-mars-but-still-no-life-1.7649645
48•pseudolus•10h ago•61 comments

EU to legislate about Chat Control behind closed doors

https://www.patrick-breyer.de/en/double-threat-to-private-communications-undemocratic-chat-contro...
575•NeutralForest•8h ago•324 comments

Marfa Public Radio Puts You to Sleep

https://www.marfapublicradio.org/podcast/marfa-public-radio-puts-you-to-sleep
380•reaperducer•20h ago•116 comments

The Excavator That Digs to a Line It Cannot See – Mobility and Field Robotics

https://atomsfrontier.substack.com/p/the-excavator-that-digs-to-a-line
6•jpatel3•2d ago•1 comments
Open in hackernews

Do LLMs pass the mirror test?

https://blog.pascalschuster.de/article/do-llms-pass-the-mirror-test
27•thepasch•3h ago

Comments

FromTheFirstIn•1h ago
The styling on the website makes me feel like my phone is a cylinder
adzm•35m ago
It's quite distracting and frustrating. No idea why you'd want the beginning and ends of lines of text to be darker than the center.
thepasch•31m ago
Sorry about that, the vignette was mainly meant for the desktop view only but is indeed much more invasive/disruptive in the mobile layout.

Should be better now.

cadamsdotcom•1h ago
> An LLM's primary modality isn't smell. It's... text. But, specifically: text in the context of a user-assistant conversation in which it's trying to be helpful. Text is how they learned about everything they know, and the user-assistant chatlog is how they communicate everything they generate

This is true for instruction-tuned models; but instruction tuning is late in the training process.

A bit like assessing a person’s self-awareness based on their high-school knowledge.

thepasch•1h ago
Very true, and something worth mentioning. Papers that tried eliciting introspective language from base models with no post-training have largely failed to find any patterns or activations that look similar to those found in instruct models when prompted for the same thing. I did sort of touch on it in the "what does this mean" section:

> *post-training* installs a self-model with actual, meaningful boundaries, and when processing falls outside those boundaries, the first-person pronoun no longer binds to the content.

But you're right I could've been more explicit about it.

cadamsdotcom•53m ago
Yep. Self-awareness is only useful for embodied organisms that exist in a social context.

Detection of errors injected into context is useful but I think it’s a different thing.

wcoenen•1h ago
I wonder what would happen if you give the model access to edit the conversation history itself? Would it try to fix the "glitches"?
impure•1h ago
For my AI Agent it sometimes detects if I manually modified the file contents or git state. And it always assumes it must have made a mistake. It's sort of annoying actually.
thepasch•1h ago
Yeah, I suspect RLHF conditioning heavily discourages models from ever implying that the user could be in the wrong (or, rather, to assume that they are in the wrong by default, since editing a file isn't really "wrong" per se). Though looking at the reactions to Opus 4.8, which has a more contrarian nature and caught a lot of flak as a result, that's probably for a reason.

It's also the reason why I ran the two tests on open weights models with unredacted thinking traces. Gemma never flagged anything in its response either, only in its thinking. Without knowing how the summarizer models are prompted, it's impossible to tell whether it was a genuine miss or just something the summarizer decided to omit.

adsharma•40m ago
A more appropriate mirror test for LLMs is to get them to state facts about their training data. Percentage of arts vs science for example.

Given the framing that they're similar to nukes and a national security issue, it's likely that the models are post trained to not answer such questions accurately.

Also the article could be trying to normalize thinking that these are more than matrix multiplication gadgets good at compression.

thepasch•25m ago
It's not really "trying" to do anything. That they're, inherently, sequential matrix multipliers with clever data propagation should be uncontroversial, but I think stopping there is overly reductive.

Mechanistic interpretability research has found plenty of indicators that real, complex, generalized, and reusable circuits develop in models as they are trained and post-trained, particularly as overtraining ratios increase and memorization shifts to generalization. That's not to say that means they must be "conscious," but the overall point is that claiming anything definitive either way is incomplete.

It can be fascinating reading if you can sort through the chuff.

supern0va•22m ago
>Also the article could be trying to normalize thinking that these are more than matrix multiplication gadgets good at compression.

Honestly, I think it's less so (for some of us) that we think they're "more than matrix multiplication gadgets good at compression", so much as thinking that perhaps what our brains are doing is not so dissimilar.

A materialist view of the world could support the idea that intelligence itself may just be a series of predictions from a big compressed multi-modal dataset. That's not to say that LLMs are doing it in a way that is even close to how our brains are doing it, but we also don't understand how different it may be, and how much utility we can get out of them even with the current architecture.

orbital-decay•38m ago
Every LLM is a classifier biased towards its own writing, but the bias is usually subtle and the naive way like this is not reliable.
throe9393i44i•35m ago
You can do much more, if you mess with harness, like translating model output language in realtime from english to french, or replacing some words.

If there is some sort of feedback loop (model has a reason to look into mirror), it usually does notice.

mohsen1•22m ago
It seems like we forget that LLMs are next token prediction systems. Using raw models without instruction following and chat completion bells and whistles will give you a better feeling of what LLMs are.

The current interface to LLMs are heavily biased towards "predict the next token in the context of a user with a helpful assistant" but LLMs are capable of other modes of next token prediction too.

Before the ChatGPT release people often measured LLM performance by how well they could produce a coherent story or a poem. that's where Anthropic model names are originating from I am guessing.