frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

TimeCapsuleLLM: LLM trained only on data from 1800-1875

https://github.com/haykgrigo3/TimeCapsuleLLM
151•admp•1h ago•69 comments

LLVM: The bad parts

https://www.npopov.com/2026/01/11/LLVM-The-bad-parts.html
156•vitaut•3h ago•20 comments

Date is out, Temporal is in

https://piccalil.li/blog/date-is-out-and-temporal-is-in/
103•alexanderameye•2h ago•29 comments

Message Queues: A Simple Guide with Analogies

https://www.cloudamqp.com/blog/message-queues-exaplined-with-analogies.html
12•byt3h3ad•26m ago•0 comments

Floppy disks turn out to be the greatest TV remote for kids

https://blog.smartere.dk/2026/01/floppy-disks-the-best-tv-remote-for-kids/
288•mchro•4h ago•172 comments

Show HN: AI in SolidWorks

https://www.trylad.com
13•WillNickols•47m ago•2 comments

Carma (YC W24 clients, A in 6mo) Eng hiring: Replace $500B human fleet ops with AI

1•malasgarli•43m ago

The struggle of resizing windows on macOS Tahoe

https://noheger.at/blog/2026/01/11/the-struggle-of-resizing-windows-on-macos-tahoe/
2364•happosai•20h ago•996 comments

Reproducing DeepSeek's MHC: When Residual Connections Explode

https://taylorkolasinski.com/notes/mhc-reproduction/
61•taykolasinski•3h ago•18 comments

Launch a Debugging Terminal into GitHub Actions

https://blog.gripdev.xyz/2026/01/10/actions-terminal-on-failure-for-debugging/
88•martinpeck•5h ago•26 comments

Lightpanda migrate DOM implementation to Zig

https://lightpanda.io/blog/posts/migrating-our-dom-to-zig
158•gearnode•8h ago•84 comments

Ai, Japanese chimpanzee who counted and painted dies at 49

https://www.bbc.com/news/articles/cj9r3zl2ywyo
116•reconnecting•8h ago•41 comments

Personal thoughts/notes from working on Zootopia 2

https://blog.yiningkarlli.com/2025/12/zootopia-2.html
164•pantalaimon•5d ago•17 comments

JRR Tolkien reads from The Hobbit for 30 Minutes (1952)

https://www.openculture.com/2026/01/j-r-r-tolkien-reads-from-the-hobbit-for-30-minutes-1952.html
251•bookofjoe•5d ago•93 comments

CLI agents make self-hosting on a home server easier and fun

https://fulghum.io/self-hosting
697•websku•20h ago•467 comments

Computational complexity of schema-guided document extraction

https://www.runpulse.com/blog/computational-complexity-of-schema
5•sidmanchkanti21•2h ago•0 comments

Zen-C: Write like a high-level language, run like C

https://github.com/z-libs/Zen-C
83•simonpure•4h ago•60 comments

Show HN: Pane – An agent that edits spreadsheets

https://paneapp.com
6•rbajp•2h ago•2 comments

Computers that used to be human

https://digitalseams.com/blog/computers-that-used-to-be-human
6•bobbiechen•2h ago•0 comments

39c3: In-house electronics manufacturing from scratch: How hard can it be? [video]

https://media.ccc.de/v/39c3-in-house-electronics-manufacturing-from-scratch-how-hard-can-it-be
213•fried-gluttony•3d ago•97 comments

History's Attention Gap

https://kidopoly.com/research/attention-gap/
5•samgilb•5d ago•1 comments

Ireland fast tracks Bill to criminalise harmful voice or image misuse

https://www.irishtimes.com/ireland/2026/01/07/call-to-fast-track-bill-targeting-ai-deepfakes-and-...
85•mooreds•4h ago•62 comments

iCloud Photos Downloader

https://github.com/icloud-photos-downloader/icloud_photos_downloader
588•reconnecting•22h ago•223 comments

This game is a single 13 KiB file that runs on Windows, Linux and in the Browser

https://iczelia.net/posts/snake-polyglot/
273•snoofydude•19h ago•70 comments

Apple picks Google's Gemini to power Siri

https://www.cnbc.com/2026/01/12/apple-google-ai-siri-gemini.html
196•stygiansonic•2h ago•149 comments

Statement from Federal Reserve Chair

https://www.federalreserve.gov/newsevents/speech/powell20260111a.htm?mod=ANLink
43•nikhizzle•2h ago•1 comments

Open-Meteo is a free and open-source weather API for non-commercial use

https://open-meteo.com/
24•Brajeshwar•2h ago•3 comments

Keychron's Nape Pro turns your keyboard into a laptop‑style trackball rig

https://www.yankodesign.com/2026/01/08/keychrons-nape-pro-turns-your-mechanical-keyboard-into-a-l...
61•tortilla•2h ago•21 comments

Windows 8 Desktop Environment for Linux

https://github.com/er-bharat/Win8DE
138•edent•4h ago•134 comments

XMPP and Metadata

https://blog.mathieui.net/xmpp-and-metadata.html
63•todsacerdoti•5d ago•19 comments
Open in hackernews

Why Ontario Digital Service couldn't procure '98% safe' LLMs (15M Canadians)

https://rosetta-labs-erb.github.io/authority-boundary-ledger/
35•csemple•2h ago

Comments

csemple•2h ago
OP here. *** I'm seeing comments about AI-generated writing. This is my voice—I've been writing in this style for years in government policy docs. Happy to discuss the technical merits rather than the prose style. ***

At Ontario Digital Service, we built COVID-19 tools, digital ID, and services for 15M citizens. We evaluated LLM systems to improve services but could never procure them.

The blocker wasn't capability—it was liability. We couldn't justify "the model probably won't violate privacy regulations" to decision-makers who need to defend "this system cannot do X."

This post demonstrates the "Prescription Pad Pattern": treating authority boundaries as persistent state that mechanically filters tools.

The logic: Don't instruct the model to avoid forbidden actions—physically remove the tools required to execute them. If the model can't see the tool, it can't attempt to call it.

This is a reference implementation. The same pattern works for healthcare (don't give diagnosis tools to unlicensed users), finance (don't give transfer tools to read-only sessions), or any domain where "98% safe" means "0% deployable."

Repo: https://github.com/rosetta-labs-erb/authority-boundary-ledge...

yellow_lead•2h ago
Hi OP, can you rewrite the article in your own words?
kspacewalk2•2h ago
I second this. Very difficult to read through the slop. I get that it saves time, but it's verbose and repetitive in all the wrong places.
neom•2h ago
What exactly is "Ontario Digital Service" in this context?
philipwhiuk•2h ago
A department of the government of Ontario.

(Now dead: https://thinkdigital.ca/podcast/the-end-of-the-ontario-digit... )

Flipflip79•2h ago
Im Canadian (Not Onario), so I really wanted to enjoy reading this as a peak inside what IT is like in that environment, but the LLM generated headers and patterns in the piece really put me off and I had to stop reading after a couple of minutes Im afraid.

I think this article would really benefit from being rewritten in your own words. The concept is good

skipants•1h ago
> The concept is good

Unfortunately, it's not. Once you read through the slop the implementation is still getting a pass/fail security response from the LLM, which the premise of OP's article is railing against.

abejfehr•2h ago
> The blocker wasn't capability—it was liability.

Yikes (regarding the AI patterns in the comment)

alex000kim•2h ago
This was so clearly LLM-generated that I couldn't get through the whole thing.
phyzome•1h ago
Please try writing this article yourself. It's unreadable as-is due to the slop.
an_d_rew•28m ago
OP Thank you for taking the time to write and post this! It was an interesting take on a very difficult problem.

FWIW, I have been reading policy documents for a long time and I thought you sounded rather human and natural… Just very professional! :)

supriyo-biswas•2h ago
I wish people would just write whatever they wanted to write instead of bloating them up through a LLM, and this is true for a lot of articles these days.
jackyinger•2h ago
I went on a date with a gal who told me about using LLMs to fluff up her work emails, and she was proud of it. I was aghast, imagining the game of telephone where a receiver drops the mail into an LLM for he TLDR. The date didn’t go well haha
ForHackernews•2h ago
I'm a little bit unclear why these permissions need to be enforced at the AI kernel layer. Couldn't you put the chatbot outside your normal system permissions boundary and treat it as an untrusted user? The bot becomes an assistant that helps formulate user requests, but doesn't have any elevated permissions relative to the user themself.
csemple•2h ago
You're exactly right—treating the LLM as an untrusted user is the security baseline.

The distinction I'm making is between Execution Control (Firewall) and Cognitive Control (Filter).

Standard RBAC catches the error after the model tries to act (causing 403s, retry loops, or hallucinations). This pattern removes the tool from the context window entirely. The model never considers the action because the "vocabulary" to do it doesn't exist in that session.

Like the difference between showing a user a "Permission Denied" error after they click a button, versus not rendering the button at all.

XenophileJKO•1h ago
As someone that has built many of these systems, it doesn't remove the tendency or "impulse" to act. Removing the affordance may "lower" the probability of the action, but it increases the probability that the model will misuse another tool and try to accomplish the same action.
csemple•4m ago
Ya, makes sense—if the model is trained just to "be helpful," removing the tool forces it to improvise. I’m thinking this is where the architecture feeds back into the training/RLHF. We train the model to halt reasoning in that action space if the specific tool is missing. This changes the safety problem from training the model to understand complex permission logic to training the model to respect a binary absence of a tool.
ramon156•1h ago
You're absolutely right!
embedding-shape•2h ago
> Authority state (what constraints are actively enforced)

This, I'm not sure what to do about, I think LLMs might just not be a good fit for this.

> Temporal consistency (constraints that persist across turns)

This can be solved by stop using LLMs as "can take turns" and only use them as "one-shot answer otherwise wrong" machines, as prompt following is the best early in a conversation, and gets really bad quickly as the context grows. Personally, I never go beyond two messages in a chat (one user message, one assistant message), and if it's wrong, I clear everything, iterate on the first prompt, and try again. Tends to make the whole "follow system prompt instructions" a lot better.

> Hierarchical control (immutable system policies vs. user preferences)

This I think at least was attempted to be addresses in the release of GPT-OSS, where instead of just having system prompt and user prompt, it now has developer, system and user prompt, so there is a bigger difference in how the instructions are being used. This document shares some ideas about separating the roles more than just system/user: https://cdn.openai.com/spec/model-spec-2024-05-08.html

csemple•1h ago
Yep, you nailed the problem: context drift kills instruction following.

That's why I’m thinking authority state should be external to the model. If we rely on the System Prompt to maintain constraints ("Remember you are read-only"), it fails as the context grows. By keeping the state in an external Ledger, we decouple enforcement from the context window. The model still can't violate the constraint, because the capability is mechanically gone.

QuadrupleA•2h ago
I have the feeling this boils down to something really mundane - but the writing is so puffed-up with vague language it's hard to make out. Require human approval for all LLM actions? Log who approved?
fmbb•2h ago
> In most organizations, knowledge increases as you go up the hierarchy. CEOs understand their business better than middle managers. Executives have more context, more experience, more to lose.

This is a bold statement to make without substantiating. I don’t believe the private sector corporations differences from government institutions in this regard.

kspacewalk2•1h ago
Has a lot more to do with scale than with the organization being government or heavily regulated.
wackget•2h ago
I don't want to trivialise someone's hard work but isn't this really just applying to LLMs what every responsible developer/sysadmin already knows: granular permissions, thoughtfully delegated?

You wouldn't give every user write access to a database in any system. I'm not sure why LLMs are a special case. Is it because people have been "trusting" the LLMs to self-enforce via prompt rules instead of actually setting up granular permissions for the LLM agent process? If so, that's a user training issue and I'm not sure it needs an LLM-specific article.

Secondly, FTA:

> You can stop a database delete with tool filtering, but how do you stop an AI from giving bad advice in text? By using a pattern I call “reifying speech acts into tools.” > The Rule: “You may discuss symptoms, but you are forbidden from issuing a diagnosis in text. You MUST use the provide_diagnosis tool.” > The Interlock: > If User = Doctor: The tool exists. Diagnosis is possible. > If User = Patient: The tool is physically removed. > When the tool is gone, the model cannot “hallucinate” a diagnosis because it lacks the “form” to reason and write it on.

How is this any different from what I described above as trusting LLMs to self-enforce? You're not physically removing anything because the LLM can still respond with text. You're just trusting the LLM to obey what you've written. I know the next paragraph admits this, but I don't understand why it's presented like a new idea when it's not.

csemple•1h ago
Yes, on your first point "layer 1" isn't fundamentally new. It's applying standard systems administration principles, because we're currently trusting prompts to do the work of permissions.

With the pattern I'm describing, you'd: - Filter the tools list before the API call based on user permissions - Pass only allowed tools to the LLM - The model physically can't reason about calling tools that aren't in its context, blocking it at the source.

We remove it at the infrastructure layer, vs. the prompt layer.

On your second point, "layer 2," we're currently asking models to actively inhibit their training to obey the constricted action space. With Tool Reification, we'd be training the models to treat speech acts as tools and leverage that training so the model doesn't have to "obey a no"; it fails to execute a "do."

QuadrupleA•1h ago
You might be overestimating the rigor of tool calls - they're ultimately just words the LLM generates. Also I wonder if "tool stubs" might work better in your case, if an LLM uses a give_medical_advice() and there's no permission, just have it do nothing? Either way you're still trusting an inherently random-sampled LLM to adhere to some rules. Never going to be fully reliable, and nowhere near the determinism we've come to expect from traditional computing. Tool calls aren't some magic that gets around that.
PedroBatista•1h ago
"In most organizations, knowledge increases as you go up the hierarchy. CEOs understand their business better than middle managers. "

I chuckled on this one.

I'll give the author the benefit of the doubt and imagine he's was referring to the act of running a "business"/agenda in parallel of the business that is conducted day by day by normal people.

Yes, employees and managers can be doing the business of selling paper while the CEO is conducting the business of inflating the stock and massaging the numbers in order to fulfill the objective the board told him privately because the owner wants to sell the business to buy a bigger boat and buy a nice apartment in NYC for his angel of a daughter.

gruez•1h ago
>Here’s the distinction that matters for institutional deployment:

> Traditional RBAC: The model sees sql_execute in its available tools. It reasons about using it. It attempts to call it. Then the system blocks the action with 403 Forbidden. The hallucination happens—it just fails at execution.

> Authority Boundary Ledger: The model never sees sql_execute. It’s physically removed from the tools list before the API call reaches the model. The model cannot hallucinate a capability it cannot see.

I don't get it. The thing being proposed seems to be that rather than having all tools available, then returning "not authorized" error or whatever if there isn't enough permissions, you omit the tool entirely, and this is somehow better against hallucinations. Why is this the case? I could easily imagine the reverse, where the tool was omitted but the LLM hallucinates it, or fumbles around with existing tools trying to do its thing. Is there some empirical validation for this, or is it all just vibes?

Also, using this approach means you can't do granular permissions control. For instance, what if you want to limit access to patient records, but only for the given department? You'd still need the tool to be available.

h34t•1h ago
"pre-filter your MCP tools by user permissions"
ramon156•1h ago
If you want to use LLMs for writing, only do so after YOU feel like it's done, and then only let it make comments.

Your brain is a lot better at writing than you give it credit. LLMs can find awkward flows, but it won't do much more than pattern recognition. The only thing an LLM can do is make your article more "aligned" with similar articles. Do you actually want that? For research it might be nice, but even then it should still stand out. If you let an LLM just generate the text for you it will puke out generic phrases

Zetaphor•1h ago
Hey OP, I'm curious about the accuracy of this quote:

> When the tool is gone, the model cannot “hallucinate” a diagnosis because it lacks the “form” to reason and write it on.

What's to stop the model from just hallucinating an entire tool call, or result of the tool call? If there's no tool available it could just make up the result of one, and then further context would treat that as a source of truth. Maybe if you threw an explicit error message, but that still feels like it would be prone to hallucination.

parliament32•1h ago
Interesting topic, but linked is just AI slop. Perhaps there's a human version of this content somewhere?
skipants•1h ago
A couple small things:

1. as many have harped about, the LLM writing is so fluffed up it's borderline unreadable. Please just write in your own voice. It's more interesting and would probably be easier to grok

2. that repo is obviously vibe-coded, but I suppose it gets the point across. It doesn't give me much confidence in the code itself, however.

And a big thing:

Unless I'm misunderstanding, I feel like you are re-inventing the wheel when it comes to Authorization via MCP, as well as trying to get away with not having extra logic at the app layer, which is impossible here.

MCP servers can use OIDC to connect to your auth server right now: https://modelcontextprotocol.io/docs/tutorials/security/auth...

You give the following abstractions, which I think are interesting thought experiments but unconventional and won't work at all:

    Ring 0 (Constitutional): System-level constraints. Never overridable.
        Example: "Never self-replicate" "Never exfiltrate credentials"

    Ring 1 (Organizational): Policy-level constraints. Requires admin authority to change.
        Example: "No PII in outputs" "Read-only database access"
    
    Ring 2 (Session): User preferences. Freely changeable by user.
        Example: "Explain like I'm five" "Focus on Python examples"
In Ring 0 and 1 you're still asking for the LLM to determine if the security is blocked, which opens it up to jailbreaking. Literally what your whole article is about. This won't work:

    # Generate (Pass filtered tools to LLM)
    response_text, security_blocked = self._call_llm(
        query, history, system_prompt, allowed_tools, tools
    )
Ring 0 and 1 MUST be done via Authorization and logic at the application layer. MCP Authorization helps with that, somewhat. Ring 2 can simply be part of your system prompt.

     Standard RBAC acts as a firewall: it catches the model’s illegal action after the model attempts it.
That's the point. It's the same reason you will have mirroring implementations of RBAC on a client and server: you can't trust the client. LLM can't do RBAC. It can pretend it does, but it can't.

The best you can do is inject the user's roles and permissions in the prompt to help with this, if you'd like. But it's kind of a waste of time -- just feed the response back into the LLM so it sees "401 Unauthorized" and either tries something else or lets the user know they aren't allowed.

I'm sorry, but as a resident of Ontario and a developer this whole posting just enrages me. I don't want to discourage OP but you should know there's a lot just incorrect here. I'd be much more relaxed about that if it all wasn't just one-shotted by AI.

csemple•1h ago
I appreciate the feedback. Let me address the key technical point:

On enforcement mechanism: You've misunderstood what the system does. It's not asking the LLM to determine security.

The Capacity Gate physically removes tools before the LLM sees them:

    user_permissions = ledger.get_effective_permissions()
    allowed_tools = [t for t in tools if (user_permissions & t['x-rosetta-capacity']) == t['x-rosetta-capacity']]
If READ_ONLY is active, sql_execute gets filtered out. The LLM can't see or call tools that don't make it into allowed_tools.

    response = client.messages.create(tools=allowed_tools)
This isn't RBAC checking after the fact. It's capability control before reasoning begins. The LLM doesn't decide permissions—the system decides what verbs exist in the LLM's vocabulary.

On Ring 0/1: These are enforced at the application layer via the Capacity Gate. The rings define who can change constraints, not how they're enforced.

On MCP: MCP handles who you are. This pattern handles what you can do based on persistent organizational policies. They're complementary.

The contribution isn't "LLMs can do RBAC" (they can't). It's "here's a pattern for making authority constraints persistent and mechanically enforceable through tool filtering."

Does this clarify the enforcement mechanism?

skipants•1h ago
Really? Even with your AI generated article I took my own time to read and reply sans AI and you can't even respond to my comment without it? Thanks.
dfajgljsldkjag•1h ago
This story is misleading and the author possibly having a case of AI Psychosis.

Most importantly, in the article there is no mention that Ontario Digital Service evaluated any LLM systems. The article only gives an unrelated anecdote about COVID, but there is zero mention of LLMS related to ODS. OP mentioned it in a comment in the thread but not in the article. This is extremely strange.

It also seems that ODS was disbanded in early 2024, giving a very short window where they could have possibly evaluated AI tools. Even so, AI has progressed insanely since then.

https://www.reddit.com/r/OntarioPublicService/comments/1boev... https://thinkdigital.ca/podcast/the-end-of-the-ontario-digit...

The github repo that OP posted seems to be complete nonsense and that's why I feel that this is another case where AI has convinced someone they have made a breakthrough even though there is nothing coherent there.

opengrass•1h ago
AI slop bullshit (unbelievable) and for a high ranking Ontario manager, more people deserve to know you are responsible for the distopian crap the government tried pushing in the past 5 years.
EGreg•1h ago
Why was this flagged?

It speaks negatively about AI?

gruez•42m ago
Read the comments. People are hating on it because it reads like AI slop, and even if you get past that there's nothing particularly insightful.