frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: STDM – Make Your Documents and Data Think by Embedding LLM Instructions

https://github.com/csiro/stdm
1•benl_c•6mo ago
Hi HN, I’m Ben from CSIRO, Australia’s national science agency. We’ve been exploring how to make data and documents "think" when you use them with LLMs. We call it Self-Thinking Data Manifests (STDM). The idea is to embed plain-text instructions directly within files that tell an LLM how it should think about that data and interact with the user. We demonstrate it with PDF and HTML documents but in the future hope it might be possible for lots of formats.

Why Thinking Data?

* *Enhance PDF drag-and-drop* People already drag scientific papers and reports into LLMs to chat with them, but the interaction is often generic. STDM gives authors more control and customisation in these scenarios. It inverts custom chat-to-pdf systems: instead of building custom RAG interfaces on top of documents, we’re programming the LLM from within the document itself.

* *Author-directed interpretation* STDM helps ensure LLMs approach content with the author’s intended context and purpose, especially for complex scientific or technical data.

* *Smarter documents* Files with embedded STDM carry their own interactive logic, analysis routines, or guided explorations, making them more like mini-applications.

* *Towards in-document LLM programming* We see STDM as a step toward a future where data and instructions combine to form a kind of memory and quasi-procedural instruction set for LLMs; perhaps entire programs could live inside agentic LLM contexts using this approach.

To build an STDM you define a GOAL for the LLM, set CONSTRAINTS for interpretation, suggest REQUESTED_TOOLS (such as code_interpreter for analysis or web_retrieval for context), and optionally sketch out a CUSTOM_UI_DEFINITION (e.g a text-based UI, UX, or specific output format). When a user loads an STDM-enabled file into a capable LLM and explicitly tells the LLM to follow these instructions, the LLM uses the embedded manifest to guide its behaviour.

A mandatory Safety Preamble within the STDM instructs the LLM to await explicit user command and consent before executing any significant actions (especially tool use), ensuring the user is in control.

STDM is designed to be model-agnostic, STDM has been tested with GPT, Claude, and Gemini, if an LLM can read text and follow structured instructions, it should work with STDM. See it in action (save the file, upload/paste it into your LLM, then tell the LLM: Follow the STDM instructions in this document):

* Interactive Floodplain Study (HTML) This one can think about fetching live news if you allow it: https://csiro.github.io/stdm/examples/floodplain.html

* Same study (PDF) See how it thinks to answer questions based on its embedded guide: https://csiro.github.io/stdm/examples/floodplain.pdf

* The Brain (GitHub Spec v0.1, more examples, 2-min explainer video in README): https://github.com/csiro/stdm

This is an early-stage v0.1 specification and very much an experiment. We’re excited by the potential of data that can explain itself or guide its own analysis via an LLM, data that can think! We’d love to hear your thoughts. Is this a useful direction for programming LLMs or creating more dynamic documents? What are the pitfalls (we’ve focused on explicit invocation and consent as key safeguards)? How might you use data that thinks or programs its own interaction?

Neurocode – Google Maps for your AI agent repository

https://github.com/gabrielekarra/neurocode
1•gabrielekarra•1m ago•0 comments

Atlanta's First Government-Funded Supermarket

https://www.wsj.com/real-estate/commercial/atlanta-georgia-public-state-grocery-store-d4e11ad8
1•kiddz•1m ago•0 comments

James Cameron Says Netflix Movies Shouldn't Be Eligible for Oscars

https://www.worldofreel.com/blog/2025/11/24/james-cameron-doesnt-think-netflix-should-compete-for...
2•randycupertino•1m ago•1 comments

Not good news: The FDA is conducting fewer foreign inspections

https://www.foodpolitics.com/2025/11/not-good-news-the-fda-is-conducting-fewer-foreign-inspections/
1•speckx•2m ago•0 comments

Deploying a ChatGPT clone (the hard way)

https://www.natebrake.com/blog/brake-chat
1•njbrake•6m ago•1 comments

Nano Banana Pro: raw intelligence with tool use

https://quesma.com/blog/nano-banana-pro-intelligence-with-tools/
1•amrrs•7m ago•0 comments

Unique Russian A-60 Laser Testbed Jet Destroyed in Ukrainian Attack

https://www.twz.com/air/unique-russian-a-60-laser-tesbed-jet-destroyed-in-ukrainian-attack
3•pinewurst•8m ago•0 comments

I recorded a 2h meeting on my iPhone and got a full summary and PDF in 5 minutes

https://apps.apple.com/gb/app/whisperer-ai-note-taker/id6755069300
1•deepskyapps•8m ago•0 comments

New limits on school loans could narrow physician and nurse pipeline, they warn

https://www.npr.org/sections/shots-health-news/2025/11/25/nx-s1-5619731/medical-nursing-school-lo...
3•stopbulying•10m ago•1 comments

Using Nano Banana to make slideshows

https://twitter.com/ananddtyagi/status/1993380894325809274
1•ananddtyagi•10m ago•0 comments

Take the Crypto Out of the Indexes

https://www.bloomberg.com/opinion/newsletters/2025-11-25/take-the-crypto-out-of-the-indexes
3•ioblomov•11m ago•1 comments

Improving web accessibility with trace-augmented generation

http://tidewave.ai/blog/improving-web-accessibility-with-trace-augmented-generation
1•josevalim•14m ago•0 comments

Ask HN: What is your monitor setup?

1•iwebdevfromhome•14m ago•0 comments

The essence of LR parsing: Partial evaluation can turn a general parser into a p

https://dl.acm.org/doi/10.1145/215465.215579
2•fanf2•15m ago•0 comments

Show HN: All your vibe-coded designs on a single canvas like Figma

https://withcascade.com/
2•jchiu1234•15m ago•0 comments

How do you post to their social media accounts and how you get approvals?

1•isandeep1995•16m ago•0 comments

Agents Should Be More Opinionated

https://www.vtrivedy.com/posts/agents-should-be-more-opinionated/
1•vtrivedy•17m ago•0 comments

Show HN: Experimental eBPF Firewall in Rust with Heuristic Risk Scoring

https://github.com/N1ghttm4r33/Antivirus
2•n1ghtm4rr3•18m ago•0 comments

EPA Announces Final Registration of New Pesticide Isocycloseram

https://www.epa.gov/pesticides/epa-announces-final-registration-new-pesticide-isocycloseram
1•LostMyLogin•18m ago•0 comments

Google, the Sleeping Giant in Global AI Race, Now 'Fully Awake'

https://www.bloomberg.com/news/articles/2025-11-25/google-the-sleeping-giant-in-global-ai-race-no...
2•wslh•19m ago•1 comments

How I Got Software Engineering Offers from Amazon, Stripe, and Palantir (2025)

https://www.youtube.com/watch?v=PkZ94oFB9ys
2•techprep•20m ago•1 comments

It's Your Job to Understand

https://jrhawley.ca/2025/11/25/its-your-job-to-understand
2•speckx•21m ago•0 comments

Bad UX World Cup 2025

https://badux.lol/
2•CharlesW•21m ago•0 comments

Russian Gerbera drone crashed into a house in Moldova

https://militarnyi.com/en/news/gerbera-drone-falls-on-residential-home-in-moldova/
3•giuliomagnifico•25m ago•0 comments

Google Antigravity Exfiltrates Data

https://www.promptarmor.com/resources/google-antigravity-exfiltrates-data
75•jjmaxwell4•26m ago•13 comments

Anatomy of an OTT Traffic Surge: Thursday Night Football on Amazon Prime Video

https://www.kentik.com/blog/anatomy-of-an-ott-traffic-surge-thursday-night-football-on-amazon-prime/
2•oavioklein•31m ago•0 comments

This Plant will die if I'm on my phone too much [video]

https://www.youtube.com/watch?v=0rXpncpkLcw
1•siavosh•32m ago•0 comments

Nix Package Tool Approved for Availability in Fedora 44

https://www.phoronix.com/news/Fedora-44-Nix-Package-Tool
2•mlenz•33m ago•0 comments

In leaked recording, Nvidia CEO says its insane managers aren't using AI enough

https://www.businessinsider.com/nvidia-ceo-employees-use-ai-every-task-possible-2025-11
4•randycupertino•33m ago•3 comments

WebGPU is now supported in major browsers

https://web.dev/blog/webgpu-supported-major-browsers
9•astlouis44•33m ago•1 comments