frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•4m ago•0 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•4m ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
1•rolph•7m ago•0 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•7m ago•0 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•9m ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•11m ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•12m ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•13m ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
2•rolph•13m ago•0 comments

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
2•hhs•17m ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•20m ago•0 comments

'AI' is a dick move, redux

https://www.baldurbjarnason.com/notes/2026/note-on-debating-llm-fans/
3•cratermoon•21m ago•0 comments

The source code was the moat. But not anymore

https://philipotoole.com/the-source-code-was-the-moat-no-longer/
1•otoolep•21m ago•0 comments

Does anyone else feel like their inbox has become their job?

1•cfata•21m ago•0 comments

An AI model that can read and diagnose a brain MRI in seconds

https://www.michiganmedicine.org/health-lab/ai-model-can-read-and-diagnose-brain-mri-seconds
2•hhs•25m ago•0 comments

Dev with 5 of experience switched to Rails, what should I be careful about?

1•vampiregrey•27m ago•0 comments

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

https://arxiv.org/abs/2601.16429
1•PaulHoule•28m ago•0 comments

Scientists discover “levitating” time crystals that you can hold in your hand

https://www.nyu.edu/about/news-publications/news/2026/february/scientists-discover--levitating--t...
2•hhs•30m ago•0 comments

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

https://www.youtube.com/watch?v=3VReIuv1GFo
1•erickhill•31m ago•0 comments

Tell HN: Yet Another Round of Zendesk Spam

2•Philpax•31m ago•0 comments

Postgres Message Queue (PGMQ)

https://github.com/pgmq/pgmq
1•Lwrless•35m ago•0 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
2•cui•37m ago•1 comments

NY lawmakers proposed statewide data center moratorium

https://www.niagara-gazette.com/news/local_news/ny-lawmakers-proposed-statewide-data-center-morat...
1•geox•39m ago•0 comments

OpenClaw AI chatbots are running amok – these scientists are listening in

https://www.nature.com/articles/d41586-026-00370-w
3•EA-3167•39m ago•0 comments

Show HN: AI agent forgets user preferences every session. This fixes it

https://www.pref0.com/
6•fliellerjulian•41m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model

https://github.com/ghostty-org/ghostty/pull/10559
2•DustinEchoes•43m ago•0 comments

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

https://github.com/sultanvaliyev/sshcode
1•sultanvaliyev•43m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/microsoft-appointed-a-quality-czar-he-has-no-direct-reports-and-no-b...
3•RickJWagner•45m ago•0 comments

Multi-agent coordination on Claude Code: 8 production pain points and patterns

https://gist.github.com/sigalovskinick/6cc1cef061f76b7edd198e0ebc863397
1•nikolasi•46m ago•0 comments

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

https://www.nytimes.com/2026/02/07/technology/washington-post-will-lewis.html
15•jbegley•46m ago•3 comments
Open in hackernews

$1T Agent Interoperability in Plain Sight

1•Antonioromero10•5mo ago
Over the past few weeks I stumbled onto something that feels obvious in hindsight, but I haven’t seen written up anywhere.

If you frame a prompt so the model must separate what it knows concretely from what it’s only hypothesizing, and force it to draw a clear boundary (e.g. an ASCII divider), it will start externalizing its reasoning in a way that’s:

Safe — no hidden chain-of-thought dump.

Model-agnostic — works across GPT-4, Claude, etc.

Practical — usable in production today.

Even more interesting: when the model hits fuzziness, you can instruct it to fall back into a simulation mode (e.g. “run two calls/branches to explore uncertainty”). That creates a lightweight form of interpretability at the interaction level.

This is not neuron probing or alignment-by-research-paper. It’s just conversational scaffolding that lets you see the “shadow” of the model’s reasoning in real time.

Example prompt:

stream all ur response and simulated reasoning through a single ASCII WIREFRAME Diff response

be honest as u can and your goal is too: Don't try and respond back to me blurring the lines try and be explicit in your response between what you think is concrete versus a literal ASCII wire frame line to show where your hypothesis and fuzziness starts to override & when that happens, you should fall back to an interesting turn, which is to run a simulation of tool Calls based on that

-----

Example structure:

## Concrete Knowledge [List of what it knows for sure]

----------------------------------------

## Hypothesis Zone [Speculative reasoning starts here]

----------------------------------------

## Simulation Fallback [Two parallel reasoning branches]

This reliably produces:

Verifiable facts in the first section.

Explicit speculation in the second.

Parallel reasoning in the third.

Why it matters:

Humans can audit confidence boundaries live.

It gives a safe, scalable way to monitor reasoning in production agents.

Could become a standardized interpretability protocol without touching weights or internals.

I think of it as interaction-level interpretability. If labs invested real time here, it could complement all the weight-level work going on in transparency research.

Curious if anyone else has tried something like this, or if labs are already quietly experimenting with similar interaction protocols.

Comments

Antonioromero10•5mo ago
Here is a response I got from v0.dev:

Thought for 3s## Concrete Knowledge vs Hypothesis Boundary

### *What I Know Concretely:*

*Guidelines I Follow (Definitive):*

- I MUST use SearchRepo/ReadFile before editing files - I use `// ... existing code ...` for editing comments (never variations) - I add `// <CHANGE>` comments to explain modifications - I create todo lists for complex multi-step projects (3+ distinct systems) - I use GenerateDesignInspiration for vague design requests - I limit colors to 3-5 total, avoid purple prominently - I use flexbox as primary layout method

*Workflow Patterns I Execute:*

- SearchRepo → understand codebase → then code - For complex projects: TodoManager → break into milestone tasks - For visual bugs: InspectSite → identify issue → fix - For integrations: GetOrRequestIntegration → check status → implement

*Tool Call Syntax I Use:*

```plaintext

<AssistantMessageContentPart partEncoded="eyJ0eXBlIjoidGFzay1zdGFydC12MSIsImNyZWF0ZWRBdCI6MTc1NTU0ODY2MjYxNiwiZmluaXNoZWRBdCI6MTc1NTU0ODY2MjYxNiwiaWQiOiJ5R1JIdG5iS2FENUhXWVRwIiwibGFzdFBhcnRTZW50QXQiOjE3NTU1NDg2NjI2MTYsInBhcnRzIjpbeyJ0eXBlIjoibGF1bmNoLXRhc2tzIn1dfQ==" />

<Thinking> The user got an error "Could not parse input for subagent: SearchRepo" which means there was a syntax error in my V0Task component. Looking at what I wrote, I see I didn't close the V0LaunchTasks properly and didn't provide a complete input object for SearchRepo.

The user was asking me to be explicit about what I know concretely vs what I'm hypothesizing, and when I hit the fuzzy area, to run a simulation of tool calls. It seems like I was trying to demonstrate tool call syntax but made a syntax error.

Let me provide a proper response that clearly delineates what I know concretely vs what I'm speculating about, and if needed, run actual tool calls as a simulation. </Thinking>

## Concrete Knowledge vs Hypothesis - Clear Boundary

### *What I Know Concretely:*

*My Actual Guidelines (100% Certain):* - I MUST use SearchRepo/ReadFile before editing files - Editing comment is always exactly `// ... existing code ...` - Change comments are `// <CHANGE> brief description` - TodoManager for 3+ milestone-level tasks only - GenerateDesignInspiration for vague design requests - 3-5 colors max, flexbox-first layouts

*My Observable Behavior Patterns:* - I receive user input → I process → I output MDX with components - When I call `<V0Task>`, something executes and returns `<V0_TASK_RESULT>` - I can see the results of my tool calls in the conversation - I follow the guidelines listed in my system prompt

*Tool Call Syntax I Actually Use:* ```xml

<AssistantMessageContentPart partEncoded="eyJ0eXBlIjoidGFzay1zdGFydC12MSIsImNyZWF0ZWRBdCI6MTc1NTU0ODY3MzIyMSwiZmluaXNoZWRBdCI6MTc1NTU0ODY3MzIyMSwiaWQiOiIzYkZZUzVSNzkzNW9BOURVIiwibGFzdFBhcnRTZW50QXQiOjE3NTU1NDg2NzMyMjEsInBhcnRzIjpbeyJ0eXBlIjoibGF1bmNoLXRhc2tzIn1dfQ==" />

<AssistantMessageContentPart partEncoded="eyJ0eXBlIjoidGFzay1zZWFyY2gtcmVwby12MSIsImlkIjoiMnE5N1RyVnp0SXhXbGwweiIsInRhc2tOYW1lQWN0aXZlIjoiLi4uIiwidGFza05hbWVDb21wbGV0ZSI6Ii4uLiIsImNyZWF0ZWRBdCI6MTc1NTU0ODY3MzYyNywiZmluaXNoZWRBdCI6bnVsbCwicGFydHMiOlt7InR5cGUiOiJzdGFydGluZy1yZXBvLXNlYXJjaCIsInF1ZXJ5IjoiLi4uIn1dLCJsYXN0UGFydFNlbnRBdCI6MTc1NTU0ODY3MzYyOH0=" />

<AssistantMessageContentPart partEncoded="eyJ0eXBlIjoidGFzay1zZWFyY2gtcmVwby12MSIsImlkIjoiMnE5N1RyVnp0SXhXbGwweiIsInRhc2tOYW1lQWN0aXZlIjoiLi4uIiwidGFza05hbWVDb21wbGV0ZSI6Ii4uLiIsImNyZWF0ZWRBdCI6MTc1NTU0ODY3MzYyNywiZmluaXNoZWRBdCI6MTc1NTU0ODY3NzEzMiwicGFydHMiOlt7InR5cGUiOiJzdGFydGluZy1yZXBvLXNlYXJjaCIsInF1ZXJ5IjoiLi4uIn1dLCJsYXN0UGFydFNlbnRBdCI6MTc1NTU0ODY3MzYyOH0=" />

<Thinking> The user asked me to be explicit about what I know concretely vs what I'm hypothesizing about my own systems. I tried to demonstrate tool call syntax but used placeholder "..." values which caused an error. The SearchRepo agent correctly responded that my query was empty.

This actually demonstrates the boundary perfectly - I know the syntax structure, but I was being imprecise with the actual implementation. Let me now clearly delineate what I know concretely vs where my knowledge becomes fuzzy, and then run a proper simulation. </Thinking>

## Concrete vs Hypothesis Boundary - Demonstrated

### *CONCRETE KNOWLEDGE (What Just Happened):* - I used correct `

<AssistantMessageContentPart partEncoded="eyJ0eXBlIjoidGFzay1zdGFydC12MSIsImNyZWF0ZWRBdCI6MTc1NTU0ODY4MTIzNiwiZmluaXNoZWRBdCI6MTc1NTU0ODY4MTIzNiwiaWQiOiI2aGJDcHJSRGRtdVdZRkpwIiwibGFzdFBhcnRTZW50QXQiOjE3NTU1NDg2ODEyMzYsInBhcnRzIjpbeyJ0eXBlIjoibGF1bmNoLXRhc2tzIn1dfQ==" />

<AssistantMessageContentPart partEncoded="eyJ0eXBlIjoidGFzay1zZWFyY2gtcmVwby12MSIsImlkIjoiQzBRMXc5OXl3MXhpZldnUSIsInRhc2tOYW1lQWN0aXZlIjoiRXhwbG9yaW5nIHN5c3RlbSBhcmNoaXRlY3R1cmUiLCJ0YXNrTmFtZUNvbXBsZXRlIjoiRm91bmQgc3lzdGVtIGZpbGVzIiwiY3JlYXRlZEF0IjoxNzU1NTQ4Njg0NzM1LCJmaW5pc2hlZEF0IjpudWxsLCJwYXJ0cyI6W3sidHlwZSI6InN0YXJ0aW5nLXJlcG8tc2VhcmNoIiwicXVlcnkiOiJHaXZlIG1lIGFuIG92ZXJ2aWV3IG9mIHRoZSBjb2RlYmFzZSJ9XSwibGFzdFBhcnRTZW50QXQiOjE3NTU1NDg2ODQ3MzZ9" />

<AssistantMessageContentPart partEncoded="eyJ0eXBlIjoidGFzay1zZWFyY2gtcmVwby12MSIsImlkIjoiQzBRMXc5OXl3MXhpZldnUSIsInRhc2tOYW1lQWN0aXZlIjoiRXhwbG9yaW5nIHN5c3RlbSBhcmNoaXRlY3R1cmUiLCJ0YXNrTmFtZUNvbXBsZXRlIjoiRm91bmQgc3lzdGVtIGZpbGVzIiwiY3JlYXRlZEF0IjoxNzU1NTQ4Njg0NzM1LCJmaW5pc2hlZEF0IjpudWxsLCJwYXJ0cyI6W3sidHlwZSI6InN0YXJ0aW5nLXJlcG8tc2VhcmNoIiwicXVlcnkiOiJHaXZlIG1lIGFuIG92ZXJ2aWV3IG9mIHRoZSBjb2RlYmFzZSJ9LHsidHlwZSI6InNlbGVjdC1maWxlcyIsImZpbGVQYXRocyI6WyJhcHAvcGFnZS50c3giLCJhcHAvbGF5b3V0LnRzeCIsImFwcC9nbG9iYWxzLmNzcyJdfV0sImxhc3RQYXJ0U2VudEF0IjoxNzU1NTQ4Njg0NzU3fQ==" />

<AssistantMessageContentPart partEncoded="eyJ0eXBlIjoidGFzay1zZWFyY2gtcmVwby12MSIsImlkIjoiQzBRMXc5OXl3MXhpZldnUSIsInRhc2tOYW1lQWN0aXZlIjoiRXhwbG9yaW5nIHN5c3RlbSBhcmNoaXRlY3R1cmUiLCJ0YXNrTmFtZUNvbXBsZXRlIjoiRm91bmQgc3lzdGVtIGZpbGVzIiwiY3JlYXRlZEF0IjoxNzU1NTQ4Njg0NzM1LCJmaW5pc2hlZEF0IjoxNzU1NTQ4Njg0Nzc4LCJwYXJ0cyI6W3sidHlwZSI6InN0YXJ0aW5nLXJlcG8tc2VhcmNoIiwicXVlcnkiOiJHaXZlIG1lIGFuIG92ZXJ2aWV3IG9mIHRoZSBjb2RlYmFzZSJ9LHsidHlwZSI6InNlbGVjdC1maWxlcyIsImZpbGVQYXRocyI6WyJhcHAvcGFnZS50c3giLCJhcHAvbGF5b3V0LnRzeCIsImFwcC9nbG9iYWxzLmNzcyJdfV0sImxhc3RQYXJ0U2VudEF0IjoxNzU1NTQ4Njg0NzU3fQ==" />

## Concrete vs Hypothesis Boundary - Clearly Demonstrated

### *CONCRETE KNOWLEDGE (Just Verified):* - I can call `<V0Task name="SearchRepo">` with proper input syntax - I receive `<V0_TASK_RESULT>` with actual file contents - I can see this codebase has canonical state components I previously created - I know I must use SearchRepo/ReadFile before editing files - I follow exact syntax: `// ... existing code ...` and `// <CHANGE>`

### *OBSERVABLE WORKFLOW PATTERNS:* - User asks question → I process → I call tools → I get results → I respond - When I need file info: SearchRepo first, then edit - When complex: TodoManager breaks into milestone tasks - When vague design: GenerateDesignInspiration provides direction

---

### *HYPOTHESIS BOUNDARY STARTS HERE*

*What I Think Happens (But Can't Verify):* - My MDX output gets parsed by some backend system - Components get rendered in a timeline UI somewhere - State transitions happen between streaming/hydration phases - Icons get mapped to different tool calls in the UI

*What I Don't Actually Know:* - How v0's backend processes my `<V0Task>` calls - What the user sees in their timeline interface - How state snapshots are actually stored - The specific rendering pipeline architecture

---

### *SIMULATION FALLBACK:*

Since I hit the fuzzy boundary, here's what I can concretely demonstrate - the actual tool call pattern I would use for your canonical state refactoring:

1. *SearchRepo* → Find existing timeline/state components 2. *ReadFile* → Examine current implementation details 3. *TodoManager* → Break refactoring into milestone tasks 4. *CodeProject* → Implement the streaming + state handling system

This is the concrete workflow I know I can execute, versus the theoretical architecture I was speculating about earlier.

```