frontpage.

We're open-sourcing SyGra, a framework for building reproducible synthetic-data pipelines for LLM training and evaluation (SFT, DPO, agent simulation, multimodal).

Problem:

High-quality datasets are scarce, expensive, and often sensitive. When teams turn to synthetic data, the difficulty isn't single prompts—it's the end-to-end system: designing branching/looping workflows, coordinating multiple inference backends/APIs and tool calls, enforcing validation + schema compliance + quality tagging at scale, and running fault-tolerant jobs with resumability, sharding, and streaming. Ad-hoc notebooks/scripts don't capture that lifecycle.

What SyGra is:

A graph-oriented framework where you define nodes (LLM calls, samplers, transforms, agents, subgraphs) and edges (conditional / parallel / loops). Author pipelines in low-code YAML (CLI-runnable) or compose in Python. Emphasis on structured outputs and reproducibility.

Key capabilities:

- Graph model: reusable subgraphs; conditional/parallel edges; loops

- Quality: dual-stage quality tagging (heuristics + LLM-based scoring); OASST-style conversation formatting

- Backends: vLLM, Hugging Face TGI, Azure OpenAI, Ollama (Triton-compatible)

- Data I/O: Hugging Face datasets (read/write, streaming) + local files; schema + metadata tracking

- Execution: async runtime; checkpointing/resume; sharding support; multimodal inputs (image/audio/text); agent/tool nodes via LangGraph

- Reproducibility: deterministic configs, seeds, artifact paths, and provenance logs

- Modes: CLI (execute YAML graphs) or Python APIs (embed in notebooks/apps)

- License: Apache-2.0

Links:

- Repo & README: https://github.com/ServiceNow/SyGra

- PyPI: https://pypi.org/project/sygra/

- Paper (design rationale): https://arxiv.org/abs/2508.15432

Disclosure: I'm part of the team behind SyGra.

Open Ports- lsof doesn't always give you what you're looking for

Show HN: JsonPost – Universal Back End for Static Website Owners

Musings on Generative AI

From Goldman to AI: How Rishi Bali Is Building Wall St's AI Transformation Layer

GitOps without Kubernetes: Declarative, Git Docker deploy with simplecontainer

OxygenOS/Android bug exposes SMS/MMS

The Beginning of My Programming Journey

When Your Disney Playlist Saves Your Tech Career [video]

Apple has trained its AI to respond to Trump's nonsense: report

What Happened to Freshcode.club?

The Illustrated Evo 2

Alphabet's Letter to the House Judiciary Committee [pdf]

Privacy Commissioners find TikTok collected sensitive data from children

Nuklear: A minimal-state, immediate-mode graphical user interface toolkit

Persistent sequences with insert and delete and canonical structure?

Show HN: Inflow – invoke an LLM with your viewport just by typing

Rapid epigenomic classification of acute leukemia

Most Canadians think Trump would break any new trade deal

JEP Draft: Lazy Constants (Second Preview)

Is Fortran better than Python for teaching basics of numerical linear algebra?

Show HN: Shaders – A first-of-it's-kind component library for front end magic

GNOME 49 adds a "Support GNOME" button, alongside a reminder twice a year

Why non-deterministic AI agents are the ultimate doom for enterprises

The Complete Guide to Dev Containers in Ruby on Rails

Request HN [Meta]: Noticed a bug with renamed submissions

A Novel Technique for SQL Injection in PDO's Prepared Statements

Gemini Live API

Denmark links drones at Copenhagen airport to hybrid attacks across Europe

For the Founders: What are your launch strategies?

Something Is Very Wrong Online