frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: STDM – Make Your Documents and Data Think by Embedding LLM Instructions

https://github.com/csiro/stdm
1•benl_c•3mo ago
Hi HN, I’m Ben from CSIRO, Australia’s national science agency. We’ve been exploring how to make data and documents "think" when you use them with LLMs. We call it Self-Thinking Data Manifests (STDM). The idea is to embed plain-text instructions directly within files that tell an LLM how it should think about that data and interact with the user. We demonstrate it with PDF and HTML documents but in the future hope it might be possible for lots of formats.

Why Thinking Data?

* *Enhance PDF drag-and-drop* People already drag scientific papers and reports into LLMs to chat with them, but the interaction is often generic. STDM gives authors more control and customisation in these scenarios. It inverts custom chat-to-pdf systems: instead of building custom RAG interfaces on top of documents, we’re programming the LLM from within the document itself.

* *Author-directed interpretation* STDM helps ensure LLMs approach content with the author’s intended context and purpose, especially for complex scientific or technical data.

* *Smarter documents* Files with embedded STDM carry their own interactive logic, analysis routines, or guided explorations, making them more like mini-applications.

* *Towards in-document LLM programming* We see STDM as a step toward a future where data and instructions combine to form a kind of memory and quasi-procedural instruction set for LLMs; perhaps entire programs could live inside agentic LLM contexts using this approach.

To build an STDM you define a GOAL for the LLM, set CONSTRAINTS for interpretation, suggest REQUESTED_TOOLS (such as code_interpreter for analysis or web_retrieval for context), and optionally sketch out a CUSTOM_UI_DEFINITION (e.g a text-based UI, UX, or specific output format). When a user loads an STDM-enabled file into a capable LLM and explicitly tells the LLM to follow these instructions, the LLM uses the embedded manifest to guide its behaviour.

A mandatory Safety Preamble within the STDM instructs the LLM to await explicit user command and consent before executing any significant actions (especially tool use), ensuring the user is in control.

STDM is designed to be model-agnostic, STDM has been tested with GPT, Claude, and Gemini, if an LLM can read text and follow structured instructions, it should work with STDM. See it in action (save the file, upload/paste it into your LLM, then tell the LLM: Follow the STDM instructions in this document):

* Interactive Floodplain Study (HTML) This one can think about fetching live news if you allow it: https://csiro.github.io/stdm/examples/floodplain.html

* Same study (PDF) See how it thinks to answer questions based on its embedded guide: https://csiro.github.io/stdm/examples/floodplain.pdf

* The Brain (GitHub Spec v0.1, more examples, 2-min explainer video in README): https://github.com/csiro/stdm

This is an early-stage v0.1 specification and very much an experiment. We’re excited by the potential of data that can explain itself or guide its own analysis via an LLM, data that can think! We’d love to hear your thoughts. Is this a useful direction for programming LLMs or creating more dynamic documents? What are the pitfalls (we’ve focused on explicit invocation and consent as key safeguards)? How might you use data that thinks or programs its own interaction?

The Quiet Triumph of King Charles III

https://www.nytimes.com/2025/09/17/opinion/king-charles-trump-britain-visit.html
1•whack•15s ago•0 comments

Albania appoints an AI-generated 'minister' to tackle corruption

https://apnews.com/article/albania-new-cabinet-parliament-ai-minister-diella-corruption-5e53c5d59...
2•pseudolus•6m ago•0 comments

Nvidia boss 'disappointed' by reported China chip ban

https://www.bbc.com/news/articles/cqxz29pe1v0o
2•aussieguy1234•7m ago•0 comments

From OnlyFans to IPO – Why creators will be the next publicly traded companies

https://michaeldcurry1.medium.com/from-onlyfans-to-ipo-why-creators-will-be-the-next-publicly-tra...
1•cursorial•8m ago•0 comments

ABC yanks Jimmy Kimmel's show 'indefinitely' after remarks about Charlie Kirk

https://www.cnn.com/2025/09/17/media/jimmy-kimmel-charlie-kirk-trump-fcc-brendan-carro
8•rubyfan•9m ago•1 comments

Updates to Discover in Search: More content from creators and publishers

https://blog.google/products/search/discover-updates-september-2025/
3•corvad•9m ago•0 comments

Is news distorting reality and tearing society apart?

2•akitatanomoshi•14m ago•1 comments

Rounding Randomly – Reasonably Right?

https://vinayuck.com/articles/rounding
1•vinayuck•17m ago•0 comments

Sub 9kHz Amateur Radio (2012)

https://sites.google.com/site/sub9khz/vlf-using-earth-mode/g3xbm-earth-mode-blog
1•nickt•21m ago•0 comments

Show HN: Bloom – an open-source alternative to Loom

https://www.thepublic.dev/posts/bloom
1•vaneyckseme•22m ago•0 comments

Cough sound artifact on Musesounds Alto Sax Staccato concert E4 note

https://github.com/musescore/MuseScore/issues/21426
2•stevage•23m ago•0 comments

TSMC Arizona: chipmaking is the art of killing variables [video]

https://www.youtube.com/watch?v=1VX3jNJmbcI
1•SkyMarshal•28m ago•1 comments

Edith Allonby: The Writer Who Courted Death for Her Novel

https://www.amusingplanet.com/2025/09/edith-allonby-writer-who-courted-death.html
2•freediver•29m ago•0 comments

Fundamental Concepts in Programming Languages (1967) [pdf]

https://fpl.cs.depaul.edu/jriely/447/assets/articles/strachey-fundamental-concepts-in-programming...
2•swatson741•30m ago•0 comments

Virtual Agent Economies

https://arxiv.org/abs/2509.10147
1•fcpguru•38m ago•0 comments

Tonemaps

https://mini.gmshaders.com/p/tonemaps
1•bpierre•40m ago•0 comments

A Membraneless Electrochemically Mediated Amine Regeneration for Carbon Capture

https://www.nature.com/articles/s41467-025-61525-3
2•PaulHoule•40m ago•0 comments

One Token to rule them all – obtaining Global Admin in every Entra ID tenant

https://dirkjanm.io/obtaining-global-admin-in-every-entra-id-tenant-with-actor-tokens/
3•colinprince•41m ago•0 comments

ABC Pulls Jimmy Kimmel Live from the Air 'Indefinitely'

https://www.vulture.com/article/abc-pulls-jimmy-kimmel-live-from-the-air-indefinitely.html
81•pulisse•43m ago•68 comments

ABC yanks Jimmy Kimmel's show 'indefinitely' after remarks about Charlie Kirk

https://www.cnn.com/2025/09/17/media/jimmy-kimmel-charlie-kirk-trump-fcc-brendan-carr
57•VikingCoder•43m ago•19 comments

Thanks for Subscribing

https://www.fsf.org/free-software-supporter/success
2•okcead•46m ago•0 comments

A Cheaper Way to Test Ventilation Rates?

https://chillphysicsenjoyer.substack.com/p/i-made-a-cheaper-way-to-test-ventilation
1•crescit_eundo•47m ago•0 comments

D port of meta tic-tac-toe game written for the GNU assembler

https://github.com/dkorpel/tictac
1•teleforce•49m ago•0 comments

Scientists' 'pivotal step' in bringing back the dodo for first time in 300 years

https://www.theguardian.com/science/2025/sep/17/dodo-birds-gene-editing-advance
3•bookofjoe•50m ago•0 comments

The Customer Is Always Right (but not always human)

https://sergey.substack.com/p/ai-agent-economy
1•neural_thing•50m ago•1 comments

Morse code beyond the solar system

https://www.johndcook.com/blog/2025/09/17/morse-code-beyond-the-solar-system/
2•ibobev•50m ago•0 comments

The Gentrification of Videogame History

https://felipepepe.medium.com/the-gentrification-of-video-game-history-dfe11f1e08ae
1•akkartik•51m ago•0 comments

Collection of mental math posts, basic and more advanced

https://www.johndcook.com/blog/2025/09/17/mental-math-posts/
1•ibobev•51m ago•0 comments

An Engineer's Perspective on the Texas Floods [video]

https://www.youtube.com/watch?v=3FfMzWa6LKg
1•bjourne•53m ago•0 comments

Anthropic admits they nerfed their Claude model in August

https://twitter.com/aiflux/status/1968443609470091277
3•tensorlibb•54m ago•3 comments