frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Nvidia open sources the synthetic data framework used to build Nemotron datasets

4•alexwatson405•34m ago
NVIDIA just open sourced NeMo Data Designer, the synthetic data framework used internally to build both pre-training and post-training datasets for Nemotron.

It lets you define an entire synthetic data pipeline directly in Python: structured outputs, statistical samplers, LLM-generated columns, dependency-aware field relationships, Python/SQL/remote validators, and optional LLM-as-judge scoring. Supports quick preview mode for fast iteration before scaling up.

Install:

``` pip install data-designer ```

A minimal example:

``` from data_designer.essentials import *

data_designer = DataDesigner() config = DataDesignerConfigBuilder()

config.add_column( SamplerColumnConfig( name="product_category", sampler_type=SamplerType.CATEGORY, params=CategorySamplerParams( values=["Electronics", "Clothing", "Home & Kitchen", "Books"] ), ) )

config.add_column( LLMTextColumnConfig( name="review", model_alias="nvidia-text", prompt="Write a short product review for a {{ product_category }} item." ) )

preview = data_designer.preview(config_builder=config) preview.display_sample_record() ```

This release also incorporates the synthetic data tech my team originally built at Gretel (now part of NVIDIA), now generally available for anyone to use or extend.

Repo: https://github.com/NVIDIA-NeMo/DataDesigner

Comments

alexwatson405•27m ago
Hi all- I’m a co-founder from Gretel; our team and tech are now part of NVIDIA.

NeMo Data Designer is our core product from Gretel and now the internal framework we use heavily for both pre- and post-training data in Nemotron for a variety of use cases.

The OSS version is fully general-purpose: Python-first, modular, and designed so you can mix statistical samplers, LLM columns, and seed datasets in a single pipeline.

Happy to answer questions or hear feedback on missing features

Django 6.0 Released

https://www.djangoproject.com/weblog/2025/dec/03/django-60-released/
2•sirodoht•3m ago•0 comments

AI infrastructure is being built on a mountain of new DEBT

https://twitter.com/GlobalMktObserv/status/1995848679404507467
2•DivingForGold•4m ago•0 comments

Extending yeast lifespan boosts biosynthetic output of valuable compounds

https://phys.org/news/2025-11-yeast-lifespan-boosts-biosynthetic-output.html
1•PaulHoule•4m ago•0 comments

Show HN: Aim-Style Instant Messaging in VSCode

https://marketplace.visualstudio.com/items?itemName=devchat-dev.devchat-im
1•milowata•4m ago•0 comments

Sugars, 'Gum,' Stardust Found in NASA's Asteroid Bennu Samples

https://www.nasa.gov/missions/osiris-rex/sugars-gum-stardust-found-in-nasas-asteroid-bennu-samples/
1•e145bc455f1•5m ago•0 comments

Instant server hot-reload across the Wasm boundary

https://primate.run/blog/primate-035#server-hot-reload
4•sarumake•5m ago•0 comments

Show HN: ToolPlex Desktop – MCP marketplace and AI workflow builder

https://toolplex.ai
1•entrehacker•7m ago•0 comments

Ask HN: Is the absence of affect the real barrier to AGI and alignment?

1•n-exploit•8m ago•0 comments

The War for Seattle [video]

https://www.youtube.com/watch?v=LpaD9qpnzI0
1•surprisetalk•9m ago•0 comments

China will eventually open its borders to mass immigration

https://twitter.com/samoburja/status/1988128253891277071
3•surprisetalk•9m ago•2 comments

Is Watching Video Bad for Children's Skills?

https://www.nber.org/papers/w34466
1•surprisetalk•9m ago•0 comments

OpenAI is facing every startup's VC question: What if Google copies you?

https://gpt3experiments.substack.com/p/openais-vc-question-what-if-google
1•nutanc•9m ago•0 comments

Is "green AI" even possible?

https://manualdousuario.net/en/is-green-ai-even-possible/
1•rpgbr•10m ago•0 comments

Gratitude

https://philippdubach.com/2025/12/03/gratitude/
1•7777777phil•10m ago•0 comments

I fixed my lactose intolerance – by chugging ALL the lactose [video]

https://www.youtube.com/watch?v=h90rEkbx95w
1•EPendragon•10m ago•0 comments

Technical Deep Dive: The 3-Step Validation System

https://transformationagents.ai/webinar
1•buttersmoothAI•11m ago•1 comments

RAG vs. Traditional ML

https://medium.com/@DavidLiCause/rag-vs-traditional-ml-390a34a2b045
1•davidlicause•11m ago•0 comments

Show HN: Local_faiss_MCP – A tiny MCP server for local RAG (FAISS and MiniLM)

1•nonatofabio•14m ago•0 comments

Show HN: Rephole, semantic code-search for your repos via REST API

https://github.com/twodHQ/rephole
1•riktar•16m ago•0 comments

OSHW: Small tablet based on RK3568 and AMOLED screen

https://oshwhub.com/oglggc/rui-xin-wei-rk3568-si-ceng-jia-li-chuang-mian-fei-gong-yi
1•thenthenthen•17m ago•1 comments

Show HN: Store Inspector – Free Chrome extension for Shopify competitor research

https://storeinspect.com
2•andersmyrmel•17m ago•0 comments

Why High Food Prices Will Make Public Groceries Inevitable

https://grocerynerd.substack.com/p/grocery-update-116-why-high-food
3•toomuchtodo•18m ago•0 comments

How to Benchmark C++ Code

https://codspeed.io/docs/guides/how-to-benchmark-cpp-code
5•art049•20m ago•0 comments

Diff of Claude Code system prompt over time

https://lukegil.github.io/claude-code-prompts/
1•lukegil626•20m ago•1 comments

Show HN: MCP Gateway – Unifying Access to MCP Servers Without N×M Integrations

https://www.truefoundry.com/mcp-gateway
5•supreetgupta•22m ago•1 comments

China's 1st reusable rocket explodes in fireball landing after reaching orbit

https://www.space.com/space-exploration/launches-spacecraft/chinas-1st-reusable-rocket-explodes-i...
5•perihelions•22m ago•0 comments

DNA analysis suggests first Australians arrived about 60k years ago

https://www.abc.net.au/news/science/2025-11-29/sahul-aboriginal-australia-65000-genetic-evidence/...
4•gmays•25m ago•1 comments

Real-world vector DB performance across the most popular providers

https://www.topk.io/blog/20251201-topk-bench
8•MarekDlugos•25m ago•4 comments

Code Walkthrough - Claude Code CLI and VS Code

https://codepointer.substack.com/p/claude-code-cli-bridging-terminal
1•ykhl1itj•27m ago•0 comments

Replace Your To-Do List with Interstitial Journaling to Increase Productivity

https://medium.com/better-humans/replace-your-to-do-list-with-interstitial-journaling-to-increase...
1•herbertl•28m ago•0 comments