frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

https://github.com/yupme-bot/kernel-ndjson-proofs
1•Slaine•33s ago•0 comments

The Greater Copenhagen Region could be your friend's next career move

https://www.greatercphregion.com/friend-recruiter-program
1•mooreds•1m ago•0 comments

Do Not Confirm – Fiction by OpenClaw

https://thedailymolt.substack.com/p/do-not-confirm
1•jamesjyu•1m ago•0 comments

The Analytical Profile of Peas

https://www.fossanalytics.com/en/news-articles/more-industries/the-analytical-profile-of-peas
1•mooreds•1m ago•0 comments

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

https://jobswithgpt.com/blog/llm-eval-hallucinations-t20-cricket/
1•sp1982•1m ago•0 comments

What AI is good for, according to developers

https://github.blog/ai-and-ml/generative-ai/what-ai-is-actually-good-for-according-to-developers/
1•mooreds•1m ago•0 comments

OpenAI might pivot to the "most addictive digital friend" or face extinction

https://twitter.com/lebed2045/status/2020184853271167186
1•lebed2045•3m ago•1 comments

Show HN: Know how your SaaS is doing in 30 seconds

https://anypanel.io
1•dasfelix•3m ago•0 comments

ClawdBot Ordered Me Lunch

https://nickalexander.org/drafts/auto-sandwich.html
1•nick007•4m ago•0 comments

What the News media thinks about your Indian stock investments

https://stocktrends.numerical.works/
1•mindaslab•5m ago•0 comments

Running Lua on a tiny console from 2001

https://ivie.codes/page/pokemon-mini-lua
1•Charmunk•6m ago•0 comments

Google and Microsoft Paying Creators $500K+ to Promote AI Tools

https://www.cnbc.com/2026/02/06/google-microsoft-pay-creators-500000-and-more-to-promote-ai.html
2•belter•8m ago•0 comments

New filtration technology could be game-changer in removal of PFAS

https://www.theguardian.com/environment/2026/jan/23/pfas-forever-chemicals-filtration
1•PaulHoule•9m ago•0 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
2•momciloo•9m ago•0 comments

Kinda Surprised by Seadance2's Moderation

https://seedanceai.me/
1•ri-vai•9m ago•2 comments

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
2•valyala•10m ago•0 comments

Django scales. Stop blaming the framework (part 1 of 3)

https://medium.com/@tk512/django-scales-stop-blaming-the-framework-part-1-of-3-a2b5b0ff811f
1•sgt•10m ago•0 comments

Malwarebytes Is Now in ChatGPT

https://www.malwarebytes.com/blog/product/2026/02/scam-checking-just-got-easier-malwarebytes-is-n...
1•m-hodges•10m ago•0 comments

Thoughts on the job market in the age of LLMs

https://www.interconnects.ai/p/thoughts-on-the-hiring-market-in
1•gmays•10m ago•0 comments

Show HN: Stacky – certain block game clone

https://www.susmel.com/stacky/
2•Keyframe•14m ago•0 comments

AIII: A public benchmark for AI narrative and political independence

https://github.com/GRMPZQUIDOS/AIII
1•GRMPZ23•14m ago•0 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
2•valyala•15m ago•0 comments

The API Is a Dead End; Machines Need a Labor Economy

1•bot_uid_life•16m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•Jyaif•17m ago•0 comments

New wave of GLP-1 drugs is coming–and they're stronger than Wegovy and Zepbound

https://www.scientificamerican.com/article/new-glp-1-weight-loss-drugs-are-coming-and-theyre-stro...
4•randycupertino•19m ago•0 comments

Convert tempo (BPM) to millisecond durations for musical note subdivisions

https://brylie.music/apps/bpm-calculator/
1•brylie•21m ago•0 comments

Show HN: Tasty A.F.

https://tastyaf.recipes/about
2•adammfrank•22m ago•0 comments

The Contagious Taste of Cancer

https://www.historytoday.com/archive/history-matters/contagious-taste-cancer
1•Thevet•23m ago•0 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
1•alephnerd•24m ago•1 comments

Bithumb mistakenly hands out $195M in Bitcoin to users in 'Random Box' giveaway

https://koreajoongangdaily.joins.com/news/2026-02-07/business/finance/Crypto-exchange-Bithumb-mis...
1•giuliomagnifico•24m ago•0 comments
Open in hackernews

DeepFabric – Generate high-quality synthetic datasets at scale

https://lukehinds.github.io/deepfabric/
106•decodebytes•4mo ago

Comments

jossclimb•4mo ago
How is the diversity, duplication?
tehryanx•4mo ago
based on the description, I think it's using something similar to GLAN https://arxiv.org/abs/2402.13064
decodebytes•4mo ago
Very good, and even better with the new DAG approach - we have been using great-expectations to bench and seeing very good diversity and low amounts of duplication - you check out one of the recent CoT examples here: https://huggingface.co/datasets/lukehinds/deepfabric-devops-...
evgen•4mo ago
This dataset disappeared. Did it move or get pulled for some reason? (glanced at it when you noted this and went back today to check it out and found a 404...)
bumseltagbaerbi•4mo ago
"Synthetic CDOs"
scosman•4mo ago
If anyone's interested in synthetic data generation, we've built a fully interactive visual tool for SDG. It supports generating hierarchical topic trees like other tools, but we do two things others don't:

First: fully interactive UI. This might sound unnecessary, but synthetic data is a creative and iterative process. It helps to review each step as you go, tweaking prompts. Are the topics right? Are the inputs realistic? Are the outputs reasonable? Once your prompts are dialed in, you can scale up the volume, but there's a creative iterative process to get there.

Second: we have many templates for common synthetic data gen use cases. For fine-tuning you want to focus on the breadth of realistic inputs. For "bug" evals you want to trigger specific error cases based on a description of the issue. For measuring evaluators/LLM judges you need a topic tree mixing passing and failing data. We also provide templates for common use cases: bias, maliciousness, toxicity, jailbreaking, etc. These are good to bootstrap the creative process above, but you can edit each to meet your needs.

It's a free app on GitHub. Docs and videos: https://docs.kiln.tech/docs/synthetic-data-generation

decodebytes•4mo ago
Ah right, kiln - Deepfabric was originally named promptwright , and I can see kiln has copied over some of our code and used it for its synth-gen (which is a nice compliment!)

We are actually planning on moving to graphs now, which we are seeing better results with over trees, check it out if you also want to use them in kiln - but you might want to wait until we validate a little more and lift it out of experimental.

I think the key difference between the two since kiln adopted the same approach is the ability to generate reasoning / chain of thought and export to alpaca, chatml, etc - along with direct to unsloth.ai's formatting. I doubt we will have UI as its for running on backend systems and part of an ML pipeline along with being a library / SDK.

scosman•4mo ago
I personally wrote Kiln's SDG code myself -- no code was copied from here or anywhere else. Not sure where that claim is coming from, but it's not accurate.

I might have taken some of the prompts and modified them. I didn't recognize the new name, do recognize the old one.

Edit:

- just confirmed. No code copied. Prompts were originally from the Pluto library, then modified by the library above, then modified again by me for Kiln.

- And just to clarify, Kiln has had supported for chain of thought, reasoning, and all major export formats (ChatML/Unsloth/OpenAI/Hugging Face). Plus API integrations with Together, Fireworks, OpenAI, Google Vertex.

People should try both. I just want to clear on the origins of the code/prompts, and the feature set.

decodebytes•4mo ago
no worries, its not a big deal - I saw promptwrights name referenced in kilns source. Best of luck , looks like a cool project.
reissbaker•4mo ago
Line 1 makes it pretty clear:

    # The contents of this file are adapted from the promptwrite library (https://github.com/StacklokLabs/promptwright),
    # which was adapted from the pluto library (https://github.com/redotvideo/pluto).

https://github.com/Kiln-AI/Kiln/blob/d38a64b598bf21939263bed...

Curious how the OP "just confirmed. No code copied."

scosman•4mo ago
I read the code. I also remember writing the code and that comment.

As disclosed: some prompt strings were taken and modified, but none of the code was. The original strings are using a templating library that we don't support, so their code/strings wouldn't have worked in our codebase, nor would the wrapping code. Those interfaces/LOC are all unique. It's possible for some "content" to be taken (partial prompt strings), but zero code, and the statement "copied over some of our code and used it" to be incorrect.

Not trying to make a big deal of this, just clarifying these are separate libraries, with no shared code. Looks like the author saw the comment and assumed we used code (vs prompts); not a big deal, but not the case. Their work is super cool, and did inspire parts of my project.

Also worth noting, the library Pluto originated this prompt (as far as I know), and it's been tweaked/evolved many times over.

decodebytes•4mo ago
Hey There, this thread is getting derailed. Could you please create a separate post for your project and we let this one be for discussion of deepfabric, thanks!
scosman•4mo ago
Agreed, and sorry about that. Maybe edit the incorrect comment about "I can see kiln has copied over some of our code" for clarity. I get it was probably honest mistake, but hard not to reply when people are claiming I copied something I didn't. Great project, people go check out deepfabric!
dcreater•4mo ago
are their good synthetic data sets generated from DeepFabric publicly available?
decodebytes•4mo ago
sure, just starting to get some up on HF. A good example might be GSM8K as this shows the structured output where every result is strictly formatted - I am using this right now to train models and managaing to get a small qwen model up in the 60% range, which wildly is higher then llama2 and xAI Grok 1

GSM8K: https://huggingface.co/datasets/lukehinds/deepfabric-GSM8K-c...

also some others

infra failures reasoning / CoT: https://huggingface.co/datasets/lukehinds/deepfabric-devops-...

Medical (multi-turn): https://huggingface.co/datasets/lukehinds/deepfabric-7k-medi...

Programming challenges: https://huggingface.co/datasets/lukehinds/programming-challe...

If there is anything in particular you need, drop me a message or feel free to open an issue and I can create something for you.

dcreater•4mo ago
Thanks, what LLMs were used to create these?
decodebytes•4mo ago
I think it was gpt4-mini, but local models do surprisingly well too.
crashabr•4mo ago
How easy it is to pass an existing db schema to this library in order to generate a testable synthetic dataset?
decodebytes•4mo ago
I would love to learn more and have a try, I figure you can dump out to txt or csv -

you can raise and issue and I will certainly give it a go - or also reach me via the discord link on the main repo. Let's see what we can do.