frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: STDM – Make Your Documents and Data Think by Embedding LLM Instructions

https://github.com/csiro/stdm
1•benl_c•6mo ago
Hi HN, I’m Ben from CSIRO, Australia’s national science agency. We’ve been exploring how to make data and documents "think" when you use them with LLMs. We call it Self-Thinking Data Manifests (STDM). The idea is to embed plain-text instructions directly within files that tell an LLM how it should think about that data and interact with the user. We demonstrate it with PDF and HTML documents but in the future hope it might be possible for lots of formats.

Why Thinking Data?

* *Enhance PDF drag-and-drop* People already drag scientific papers and reports into LLMs to chat with them, but the interaction is often generic. STDM gives authors more control and customisation in these scenarios. It inverts custom chat-to-pdf systems: instead of building custom RAG interfaces on top of documents, we’re programming the LLM from within the document itself.

* *Author-directed interpretation* STDM helps ensure LLMs approach content with the author’s intended context and purpose, especially for complex scientific or technical data.

* *Smarter documents* Files with embedded STDM carry their own interactive logic, analysis routines, or guided explorations, making them more like mini-applications.

* *Towards in-document LLM programming* We see STDM as a step toward a future where data and instructions combine to form a kind of memory and quasi-procedural instruction set for LLMs; perhaps entire programs could live inside agentic LLM contexts using this approach.

To build an STDM you define a GOAL for the LLM, set CONSTRAINTS for interpretation, suggest REQUESTED_TOOLS (such as code_interpreter for analysis or web_retrieval for context), and optionally sketch out a CUSTOM_UI_DEFINITION (e.g a text-based UI, UX, or specific output format). When a user loads an STDM-enabled file into a capable LLM and explicitly tells the LLM to follow these instructions, the LLM uses the embedded manifest to guide its behaviour.

A mandatory Safety Preamble within the STDM instructs the LLM to await explicit user command and consent before executing any significant actions (especially tool use), ensuring the user is in control.

STDM is designed to be model-agnostic, STDM has been tested with GPT, Claude, and Gemini, if an LLM can read text and follow structured instructions, it should work with STDM. See it in action (save the file, upload/paste it into your LLM, then tell the LLM: Follow the STDM instructions in this document):

* Interactive Floodplain Study (HTML) This one can think about fetching live news if you allow it: https://csiro.github.io/stdm/examples/floodplain.html

* Same study (PDF) See how it thinks to answer questions based on its embedded guide: https://csiro.github.io/stdm/examples/floodplain.pdf

* The Brain (GitHub Spec v0.1, more examples, 2-min explainer video in README): https://github.com/csiro/stdm

This is an early-stage v0.1 specification and very much an experiment. We’re excited by the potential of data that can explain itself or guide its own analysis via an LLM, data that can think! We’d love to hear your thoughts. Is this a useful direction for programming LLMs or creating more dynamic documents? What are the pitfalls (we’ve focused on explicit invocation and consent as key safeguards)? How might you use data that thinks or programs its own interaction?

Langjam Gamejam: Build a programming language and then use it to make a game

https://austinhenley.com/blog/langjamgamejam.html
1•azhenley•55s ago•0 comments

VMware isn't budging in its pursuit of Siemens for alleged unpaid licenses

https://www.theregister.com/2025/11/28/vmware_vs_siemens_fresh_filings/
1•ipeev•2m ago•0 comments

Open Source Pledge: Posit contributed $493K to OSS in 12 months ($750K to date)

https://posit.co/blog/posit-open-source-pledge-2025
1•ionychal•3m ago•0 comments

Show HN: Browser Calendar: Track Safari, Chrome, Firefox, Edge & Opera Releases

https://browsercalendar.com/
1•grosmar•4m ago•1 comments

Show HN: Minimalistic hex/binary text visualizer for educational UTF-8 demo

https://chessnawk.vercel.app/tools/hex
2•vitaly-pavlenko•11m ago•0 comments

Jedi Blue

https://en.wikipedia.org/wiki/Jedi_Blue
3•redbell•12m ago•0 comments

Common database for Chat Applications

https://progressdb.dev
1•hasante•14m ago•1 comments

Flight disruption warning as Airbus requests modifications to 6k planes

https://www.bbc.com/news/live/cvg4y6g74ert
3•nrhrjrjrjtntbt•15m ago•1 comments

The Secret Superfood of Thanksgiving

https://www.twopct.com/p/the-secret-superfood-of-thanksgiving
3•bilsbie•16m ago•0 comments

A Deep Dive into the Qualcomm Snapdragon X2 Elite SoC Details

https://www.semiaccurate.com/2025/11/27/a-deep-dive-into-the-qualcomm-snapdragon-x2-elite-soc-det...
2•walterbell•18m ago•0 comments

Own a Graph

https://staysaasy.com/strategy/2025/11/25/own-a-graph.html
2•RyeCombinator•18m ago•0 comments

Indoor Dog Park Directory – Find Climate-Controlled Dog Play Spaces California

https://www.indoordogpark.org
1•mabalal•28m ago•1 comments

The Art of KPop Demon Hunters

https://theartofkpopdemonhunters.com/
1•lehi•29m ago•0 comments

MetaFun: Compile Haskell-like code to C++ template metaprograms

https://gergo.erdi.hu/projects/metafun/
1•todsacerdoti•30m ago•0 comments

Strategic Fabrication in AI Self-Governance: An Empirical Audit of 9 Major LLMs

https://zenodo.org/records/17754943
2•mikeup91•34m ago•1 comments

Ask HN: What is the purpose of all these AI spam comments?

10•GaryBluto•34m ago•4 comments

Google Images: Shirts Without Stripes

https://www.google.com/search?newwindow=1&fbs=&q=shirts%2Bwithout%2Bstripes&sa=X&biw=1152&bih=958...
1•gregsadetsky•34m ago•2 comments

Are Peptide Injections Safe?

https://www.washingtonpost.com/health/2025/11/26/peptides-bodybuilding-injections-side-effects/
1•bookofjoe•37m ago•1 comments

Software Issue Hits Planes

https://news.sky.com/story/airbus-latest-software-issue-hits-thousands-of-planes-13476780
4•scopeh•39m ago•1 comments

Building a Distributed Database in Elixir, Part 3: Storage Layer and Why RocksDB

https://medium.com/@gawry/storage-layer-why-rocksdb-part-3-814e1d24a1a6
5•gawry•43m ago•1 comments

Keeping the Streak Alive

https://quartr.com/insights/edge/keeping-the-streak-alive-the-story-of-duolingo
1•sujayk_33•44m ago•0 comments

Chicago Data Center Overheated–and Shut Down Trade in Key Markets

https://www.wsj.com/finance/cme-options-futures-trading-halted-amid-data-center-issue-16e96ed1
2•perihelions•44m ago•0 comments

Turris Om Nia NG

https://www.discomp.cz/turris-omnia-ng_d130526.html
1•senorqa•47m ago•0 comments

Who's Grading You on Coursera? The Shift from Human Peers to AI

https://www.classcentral.com/report/coursera-peer-assessment-still-broken/
3•raybb•53m ago•0 comments

One point I made that didn't come across: Ilya

https://twitter.com/ilyasut/status/1994424504370581726
2•sabareesh•54m ago•0 comments

In Denmark, 'Night's Watch' Guards Monitor Trump from the Foreign Ministry

https://jen.jiji.com/jc/eng_agt?g=adnkronos&k=20251128KRONOS-202511112509571700_eng
2•SanjayMehta•54m ago•0 comments

The Best Improvement I've made to my Cursor workflow

https://foundinglean.substack.com/p/the-best-improvement-ive-made-to
1•indigodaddy•54m ago•0 comments

CME Group Commodity Futures Trading Halted, Traders Say

https://www.bloomberg.com/news/articles/2025-11-28/cme-group-commodity-futures-trading-halted-tra...
4•petethomas•56m ago•0 comments

Social media algorithms can alter political views, browser extension study shows

https://www.euronews.com/next/2025/11/28/social-media-algorithms-can-alter-political-views-browse...
4•geox•57m ago•0 comments

Ask HN: Why don't closed captions boldface words that are likely to be misheard?

3•amichail•58m ago•1 comments