frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: SyGra – Graph-oriented Synthetic data generation Pipeline for LLMs

https://github.com/ServiceNow/SyGra
1•zephyrzilla•4mo ago
We're open-sourcing SyGra, a framework for building reproducible synthetic-data pipelines for LLM training and evaluation (SFT, DPO, agent simulation, multimodal).

Problem:

High-quality datasets are scarce, expensive, and often sensitive. When teams turn to synthetic data, the difficulty isn't single prompts—it's the end-to-end system: designing branching/looping workflows, coordinating multiple inference backends/APIs and tool calls, enforcing validation + schema compliance + quality tagging at scale, and running fault-tolerant jobs with resumability, sharding, and streaming. Ad-hoc notebooks/scripts don't capture that lifecycle.

What SyGra is:

A graph-oriented framework where you define nodes (LLM calls, samplers, transforms, agents, subgraphs) and edges (conditional / parallel / loops). Author pipelines in low-code YAML (CLI-runnable) or compose in Python. Emphasis on structured outputs and reproducibility.

Key capabilities:

- Graph model: reusable subgraphs; conditional/parallel edges; loops

- Quality: dual-stage quality tagging (heuristics + LLM-based scoring); OASST-style conversation formatting

- Backends: vLLM, Hugging Face TGI, Azure OpenAI, Ollama (Triton-compatible)

- Data I/O: Hugging Face datasets (read/write, streaming) + local files; schema + metadata tracking

- Execution: async runtime; checkpointing/resume; sharding support; multimodal inputs (image/audio/text); agent/tool nodes via LangGraph

- Reproducibility: deterministic configs, seeds, artifact paths, and provenance logs

- Modes: CLI (execute YAML graphs) or Python APIs (embed in notebooks/apps)

- License: Apache-2.0

Links:

- Repo & README: https://github.com/ServiceNow/SyGra

- PyPI: https://pypi.org/project/sygra/

- Paper (design rationale): https://arxiv.org/abs/2508.15432

Disclosure: I'm part of the team behind SyGra.

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•4m ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•6m ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
1•savrajsingh•7m ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•9m ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•13m ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•17m ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
1•g1raffe•20m ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•25m ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
1•rolph•30m ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•31m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•37m ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/
1•telui•37m ago•1 comments

They Hijacked Our Tech [video]

https://www.youtube.com/watch?v=-nJM5HvnT5k
1•cedel2k1•41m ago•0 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
30•chwtutha•41m ago•5 comments

HRL Labs in Malibu laying off 1/3 of their workforce

https://www.dailynews.com/2026/02/06/hrl-labs-cuts-376-jobs-in-malibu-after-losing-government-work/
2•osnium123•42m ago•1 comments

Show HN: High-performance bidirectional list for React, React Native, and Vue

https://suhaotian.github.io/broad-infinite-list/
2•jeremy_su•43m ago•0 comments

Show HN: I built a Mac screen recorder Recap.Studio

https://recap.studio/
1•fx31xo•46m ago•0 comments

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

1•kachapopopow•51m ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/
1•melvinodsa•53m ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•1h ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis
3•thread_id•1h ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504
1•geeknews•1h ago•0 comments

TSMC to produce 3-nanometer chips in Japan

https://www3.nhk.or.jp/nhkworld/en/news/20260205_B4/
3•cwwc•1h ago•0 comments

Quantization-Aware Distillation

http://ternarysearch.blogspot.com/2026/02/quantization-aware-distillation.html
2•paladin314159•1h ago•0 comments

List of Musical Genres

https://en.wikipedia.org/wiki/List_of_music_genres_and_styles
1•omosubi•1h ago•0 comments

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

https://sknet.ai/
1•BeinerChes•1h ago•0 comments

University of Waterloo Webring

https://cs.uwatering.com/
2•ark296•1h ago•0 comments

Large tech companies don't need heroes

https://www.seangoedecke.com/heroism/
3•medbar•1h ago•0 comments

Backing up all the little things with a Pi5

https://alexlance.blog/nas.html
1•alance•1h ago•1 comments

Game of Trees (Got)

https://www.gameoftrees.org/
3•akagusu•1h ago•1 comments