frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

AfterImage – Generate synthetic multi-turn chat data from documents

https://github.com/altaidevorg/afterimage
5•monatis•2h ago

Comments

monatis•2h ago
We kept running into the same exact bottleneck with fine-tuning and evals: You have the source documents, and you have the base model, but you usually don’t have the actual conversations.

If you’re working with internal docs, regulatory text, or technical manuals, there’s plenty of material but zero multi-turn chat logs. And flattening this into standard instruction/response pairs creates models that sound like templates, failing to capture how users actually ask for clarification or push back.

So we open-sourced a small, opinionated library called AfterImage.

It generates synthetic multi-turn conversations grounded in a corpus you provide. The architecture is straightforward: - A simulated user ("Correspondent") with optional persona variation - A simulated assistant ("Respondent") - Both strictly grounded via sampled source material - Outputs directly to JSONL for your SFT (Supervised Fine-Tuning) / eval pipelines

*Why build this?* The narrow bet here is that multi-turn dialogue is its own distinct data problem. There are already great general synthetic data tools (distilabel, synthetic-data-kit). We aren't competing with them. AfterImage prioritizes composable design where generation can be customized with callbacks. For example, you can connect it to various data sources such as local files or Qdrant collections, or you can choose retriever strategies for RAG or aggregation methods for composite evaluation.

*A few honest caveats:* - We don’t have a strong published benchmark yet (semantic similarity only so far). - Quality noticeably degrades/loops as conversations get too long (>5+ turns). Luckily, one-to-three turns is more than enough for most SFT cases.

efecnc•2h ago
Simulating a user that actually sounds real is definitely the hardest part of this. Curious how you're handling the chunking and retrieval under the hood here.

Does the 'user' agent get fed a specific chunk of text to formulate its questions, and does the 'assistant' agent get that exact same chunk to reply? If they're both looking at the identical text, have you thought about injecting some noise or unrelated distractor chunks into the assistant's context? Might be a solid way to make the resulting SFT data more robust against hallucinations.

monatis•1h ago
Yeah this is one possible way to generate grounded an"responses" in Afterimage. To accomplish context augmentation when generating a response, it allows to use different RAG strategies where retriever may be chosen for the specific use case at hand. This is where composability comes into play.

What to know about naval blockades as U.S. begins patrols the Strait of Hormuz

https://text.npr.org/nx-s1-5783870
1•mooreds•42s ago•0 comments

Caveman – why use many token when few do trick

https://github.com/JuliusBrussee/caveman/blob/main/README.md
1•kerblang•1m ago•0 comments

Clojure The Documentary, official film [video]

https://www.youtube.com/watch?v=Y24vK_QDLFg
1•bmillare•2m ago•0 comments

Women in Tech: Journeys, Grit, and the Future We're Building

https://www.harness.io/blog/women-in-tech-journeys-grit-and-the-future-were-building
1•mooreds•4m ago•0 comments

We gave an AI a 3 year retail lease and asked it to make a profit

https://andonlabs.com/blog/andon-market-launch
1•lukaspetersson•4m ago•0 comments

How to Outline Text (Badly, at First)

https://kellydornhaus.com/blog/font-outlines.html
1•furyofantares•4m ago•0 comments

Prt-Scan: AI-Powered GitHub Actions Supply Chain Attack

https://www.wiz.io/blog/six-accounts-one-actor-inside-the-prt-scan-supply-chain-campaign
1•speckx•5m ago•0 comments

Short Attention Span Theater

https://randsinrepose.com/archives/short-attention-span-theater/
1•mooreds•6m ago•0 comments

Bullet train upgrade brings 5G windows and noise-cancelling cabins to Japan

https://www.theregister.com/2026/04/16/jr_central_shinkansen_tech/
1•smurda•6m ago•0 comments

Saving Us from Ourselves?

https://leancrew.com/all-this/2026/03/saving-us-from-ourselves/
1•surprisetalk•8m ago•0 comments

Epicycles All the Way Down

https://www.strangeloopcanon.com/p/epicycles-all-the-way-down
1•surprisetalk•8m ago•0 comments

Synchronous Programming for Kids: A Manifesto [video]

https://www.youtube.com/watch?v=ESK695bHI5Q
1•surprisetalk•8m ago•0 comments

The Unix Executable as a Smalltalk Method [video]

https://www.youtube.com/watch?v=sZjPQ7vtLNA
1•surprisetalk•8m ago•0 comments

Dizzying Spiral Staircase with Single Guardrail Once Led to Top of Eiffel Tower

https://www.smithsonianmag.com/smart-news/a-dizzying-spiral-staircase-with-a-single-guardrail-onc...
1•bookofjoe•9m ago•0 comments

Ask HN: Why no insurance is fully transparent about how they handle each case?

1•julienreszka•9m ago•0 comments

Serious Weaknesses in the EU Age Verification App

https://twitter.com/intcyberdigest/status/2044762941019295772
1•gostsamo•10m ago•0 comments

DESI Completes Planned 3D Map of the Universe

https://newscenter.lbl.gov/2026/04/15/desi-completes-planned-3d-map-of-the-universe-and-continues...
1•ohjeez•11m ago•0 comments

What is audio visual entrainment? Science, benefits, and how it works

https://6th.tech/en/blog/audio-visual-entrainment
1•smanuel•11m ago•0 comments

Show HN: I built a free Mac app locker because AppLocker ($18) freezes Macs

https://github.com/dutkiewiczmaciej/MakLock
1•makmakapps•11m ago•0 comments

Making backwards- and forwards-compatible web programs (kbrecordzz)

http://kbrecordzz.com/2024/11/making-backwards--and-forwards-compatible-web-programs/
1•bollkalle•12m ago•0 comments

A Gentle Introduction to Mercury

https://bctnry.github.io/gentle-introduction-to-mercury/
2•PaulHoule•12m ago•1 comments

Cruise industry eyes nuclear to power a sustainable future

https://www.lr.org/en/knowledge/horizons/april-2026/cruise-industry-eyes-nuclear-to-power-a-susta...
1•mpweiher•13m ago•0 comments

Solve by Default

https://theengineersetlist.substack.com/p/solve-by-default
1•speckx•13m ago•0 comments

Warzone will be renamed to War.app

https://www.warzone.com/blog/index.php/2026/04/
2•wmlhwl•14m ago•0 comments

The Secret Art of Elicitation

https://www.generalist.com/p/confidential
1•jbredeche•15m ago•0 comments

Lead Full-Stack Engineer Marker Learning – Remote

1•lucie22sc32•15m ago•0 comments

Show HN: Yutu – A modern Lua linter written in Rust

https://github.com/0x2a-42/yutu
1•0x2a-42•16m ago•0 comments

How Missions Work

https://factory.ai/news/missions-architecture
1•gmays•16m ago•0 comments

Gemma 4-written, small cc0 encyclopedia of some core science content

https://stateofutopia.com/encyclopedia/
1•logicallee•16m ago•1 comments

It's the End of the Internet as We Know It

https://www.nytimes.com/2026/04/15/opinion/mythos-open-souce-internet.html
1•gpi•16m ago•1 comments