frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Data Activation Thoughts

https://galsapir.github.io/sparse-thoughts/2026/01/17/data_activation/
8•galsapir•9h ago
i've been working with healthcare/biobank data and keep thinking about what "data moats" mean now that llms can ingest anything. some a16z piece from 2019 said moats were eroding — now the question seems to be whether you can actually make your data useful to these systems, not just have it. there's some recent work (tables2traces, ehr-r1) showing you can convert structured medical data into reasoning traces that improve llm performance, but the approaches are still rough and synthetic traces don't fully hold up to scrutiny (writing this to think through it, not because i have answers)

Comments

sgt101•34m ago
How to know if one should fine tune/pretrain or RL / reasoning train given some data set?