frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: STDM – Make Your Documents and Data Think by Embedding LLM Instructions

https://github.com/csiro/stdm
1•benl_c•6mo ago
Hi HN, I’m Ben from CSIRO, Australia’s national science agency. We’ve been exploring how to make data and documents "think" when you use them with LLMs. We call it Self-Thinking Data Manifests (STDM). The idea is to embed plain-text instructions directly within files that tell an LLM how it should think about that data and interact with the user. We demonstrate it with PDF and HTML documents but in the future hope it might be possible for lots of formats.

Why Thinking Data?

* *Enhance PDF drag-and-drop* People already drag scientific papers and reports into LLMs to chat with them, but the interaction is often generic. STDM gives authors more control and customisation in these scenarios. It inverts custom chat-to-pdf systems: instead of building custom RAG interfaces on top of documents, we’re programming the LLM from within the document itself.

* *Author-directed interpretation* STDM helps ensure LLMs approach content with the author’s intended context and purpose, especially for complex scientific or technical data.

* *Smarter documents* Files with embedded STDM carry their own interactive logic, analysis routines, or guided explorations, making them more like mini-applications.

* *Towards in-document LLM programming* We see STDM as a step toward a future where data and instructions combine to form a kind of memory and quasi-procedural instruction set for LLMs; perhaps entire programs could live inside agentic LLM contexts using this approach.

To build an STDM you define a GOAL for the LLM, set CONSTRAINTS for interpretation, suggest REQUESTED_TOOLS (such as code_interpreter for analysis or web_retrieval for context), and optionally sketch out a CUSTOM_UI_DEFINITION (e.g a text-based UI, UX, or specific output format). When a user loads an STDM-enabled file into a capable LLM and explicitly tells the LLM to follow these instructions, the LLM uses the embedded manifest to guide its behaviour.

A mandatory Safety Preamble within the STDM instructs the LLM to await explicit user command and consent before executing any significant actions (especially tool use), ensuring the user is in control.

STDM is designed to be model-agnostic, STDM has been tested with GPT, Claude, and Gemini, if an LLM can read text and follow structured instructions, it should work with STDM. See it in action (save the file, upload/paste it into your LLM, then tell the LLM: Follow the STDM instructions in this document):

* Interactive Floodplain Study (HTML) This one can think about fetching live news if you allow it: https://csiro.github.io/stdm/examples/floodplain.html

* Same study (PDF) See how it thinks to answer questions based on its embedded guide: https://csiro.github.io/stdm/examples/floodplain.pdf

* The Brain (GitHub Spec v0.1, more examples, 2-min explainer video in README): https://github.com/csiro/stdm

This is an early-stage v0.1 specification and very much an experiment. We’re excited by the potential of data that can explain itself or guide its own analysis via an LLM, data that can think! We’d love to hear your thoughts. Is this a useful direction for programming LLMs or creating more dynamic documents? What are the pitfalls (we’ve focused on explicit invocation and consent as key safeguards)? How might you use data that thinks or programs its own interaction?

How China Lost Mongolia – Does Taipei Still Claim It? [video]

https://www.youtube.com/watch?v=QOIWrmonY0E
1•hunglee2•54s ago•0 comments

Adolescence lasts into 30s – new study shows four pivotal ages for your brain

https://www.bbc.com/news/articles/cgl6klez226o
2•beardyw•2m ago•0 comments

The 30-foot sea cow quickly hunted to extinction because of its tasty meat

https://www.nationalgeographic.com/history/article/stellers-sea-cow-30-foot-hunted-extinction
1•cbzbc•5m ago•0 comments

Show HN: Martini-Kit, create multiplayer games without writing networking code

https://martini.blueprintlab.io/
1•yaoke259•6m ago•0 comments

Nvidia Says It's Not Enron in Private Memo Refuting Accounting Questions

https://www.barrons.com/articles/nvidia-stock-ai-accounting-allegations-366f16ac?gaa_at=eafs&gaa_...
4•zerosizedweasle•8m ago•2 comments

Qwen2.5 Coder 1.5B Roblox

https://huggingface.co/umjunsik1323/Qwen2.5-Coder-1.5B-roblox
1•umjunsik132•8m ago•1 comments

Ask HN: Scaling local FAISS and LLM RAG system (356k chunks)architectural advice

1•paul2495•9m ago•0 comments

DroneEye – Real-Time European Drone Incident Intelligence Dashboard

https://www.droneeye.eu
3•Gyarbij•9m ago•0 comments

Negative Differential Conductance in triangular molecular assemblies

https://arxiv.org/abs/2508.05575
1•peter_d_sherman•12m ago•1 comments

Schema.org: create, maintain, and promote schemas for structured data

https://schema.org/docs/about.html
1•doener•13m ago•0 comments

Flowtion

https://theflowtion.com
1•bellamoon544•15m ago•1 comments

What the Hell Is Bongo Cat and Why Is It Topping the Steam Charts? [video]

https://www.youtube.com/watch?v=CzhHFinV0ng
1•doener•17m ago•0 comments

Ask HN: My post not shown in showhn

1•witnessme•21m ago•1 comments

CVFormatter - Recruitment automation for formatting CVs to branded template.

https://www.cvformatter.co/
1•DaisyChenMS•21m ago•1 comments

USR's campaigns for the Password Modem were historically bad ads

https://buttondown.com/suchbadtechads/archive/usr-password-modem-benjamin-franklin/
1•rfarley04•22m ago•0 comments

Saudi Arabia's Prince Has Big Plans, but His Giant Fund Is Low on Cash

https://www.nytimes.com/2025/11/19/business/pif-saudi-arabia-fund-problems.html
1•HelloUsername•22m ago•0 comments

Total bill for Australian bureau of meteorology new website came in at $96M

https://www.smh.com.au/politics/federal/total-bill-for-bom-s-new-website-came-in-at-96-million-20...
4•SerCe•23m ago•0 comments

An experiment in mood-based movie discovery: Lumigo.tv

https://lumigo.tv/en-US
1•nicola_alessi•31m ago•1 comments

The 1916 Design Pattern That Still Works

https://substack.com/inbox/post/179577858
2•hholen•34m ago•0 comments

Self-upgrading agent mesh with emotions, secure memory, and living bots

https://sherin.tech/
1•rafeez•35m ago•1 comments

Show HN: I built a lightweight LLM workflow with only JavaScript and Code hooks

https://github.com/RestDB/codehooks-io-examples/tree/main/llm-workflow-example
3•bjabrboe1984•40m ago•1 comments

Call for Interpreters: Translate 39C3

https://events.ccc.de/en/2025/11/24/39c3-call-for-interpreters/
1•doener•41m ago•0 comments

Your brain changes at 9, 32, 66, and 83

https://www.popsci.com/health/brain-changes-aging/
7•geox•46m ago•0 comments

Best GPUs for Deep Learning – The Right 6-Card Comparison

1•EVAN1098•53m ago•0 comments

Best GPUs for Deep Learning – The Right 6-Card Comparison

1•EVAN1098•54m ago•0 comments

HunyuanOCR

https://curateclick.com/blog/hunyuan-ocr-guide
1•czmilo•56m ago•0 comments

Viewing old hintbooks without a red gel viewer

https://boredzo.org/blog/archives/2025-11-24/viewing-old-hintbooks-without-a-red-gel-viewer
1•ingve•57m ago•1 comments

Could Virtual Reality Help Doctors Learn Empathy?

https://nautil.us/could-virtual-reality-help-doctors-learn-empathy-1249812/
2•fleahunter•1h ago•1 comments

Nano Banana Pro Cloud – Free Trial – 20 styles, 500 prompts

https://nanobananapro.cloud
1•paidx•1h ago•1 comments

Malware Hiding in a Fake System32 Directory Using NTFS Trailing-Space Trick

https://medium.com/@omar.k.alsahily/uncovering-a-fake-system32-directory-a-deep-dive-into-ntfs-pa...
1•CriticalLY•1h ago•1 comments