frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
147•isitcontent•6h ago•15 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
252•vecti•8h ago•120 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
181•eljojo•9h ago•124 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
48•phreda4•5h ago•8 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
77•antves•1d ago•57 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
40•nwparker•1d ago•10 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
12•NathanFlurry•14h ago•5 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
147•bsgeraci•23h ago•61 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
3•AGDNoob•2h ago•1 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•4 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
3•osmansiddique•3h ago•0 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•5h ago•1 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•3h ago•0 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
13•toborrm9•11h ago•5 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
23•dchu17•10h ago•11 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•5h ago•0 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•5h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
10•KevinChasse•11h ago•9 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
170•vkazanov•1d ago•48 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
7•sawyerjhood•12h ago•0 comments

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

https://github.com/SpOpsi/Project-Baver
2•solarV26•9h ago•0 comments

Show HN: Agentism – Agentic Religion for Clawbots

https://www.agentism.church
2•uncanny_guzus•9h ago•0 comments

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

https://github.com/BansheeTech/Disavow-Generator
5•SurceBeats•15h ago•1 comments

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

https://github.com/choihimchan/bpu-stream-engine
2•octablock•11h ago•0 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments

Show HN: Total Recall – write-gated memory for Claude Code

https://github.com/davegoldblatt/total-recall
10•davegoldblatt•1d ago•6 comments

Show HN: Hibana – An Affine MPST Runtime for Rust

https://hibanaworks.dev
3•o8vm•12h ago•0 comments

Show HN: Beam – Terminal Organizer for macOS

https://getbeam.dev/
2•faalbane•13h ago•2 comments

Show HN: Agent Arena – Test How Manipulation-Proof Your AI Agent Is

https://wiz.jock.pl/experiments/agent-arena/
45•joozio•15h ago•47 comments
Open in hackernews

Show HN: DDL to Data – Generate realistic test data from SQL schemas

55•tbrannan•1mo ago
I built DDL to Data after repeatedly pushing back on "just use production data and mask it" requests. Teams needed populated databases for testing, but pulling prod meant security reviews, PII scrubbing, and DevOps tickets. Hand-written seed scripts were the alternative slow, fragile, and out of sync the moment schemas changed.

Paste your CREATE TABLE statements, get realistic test data back. It parses your schema, preserves foreign key relationships, and generates data that looks real, emails look like emails, timestamps are reasonable, uniqueness constraints are honored.

No setup, no config. Works with PostgreSQL and MySQL.

https://ddltodata.com

Would love feedback from anyone who deals with test data or staging environments. What's missing?

Comments

ForHackernews•1mo ago
This is a great idea. I've thought about doing something similar! On the other hand, I'm not sure it's a business. Is this using AI?

The pricing seems extremely high for what's basically a call to https://github.com/faker-ruby/faker but that makes sense if it has to pay for OpenAI tokens.

(who knows though, plenty of B2B deals signed for sillier things than this - good luck, OP)

tbrannan•1mo ago
Thanks! To clarify, the core engine isn't AI. It's deterministic pattern matching, so it runs in milliseconds with no token costs. There's an optional "Story Mode" that uses AI for narrative-coherent data (like "a churning SaaS with seasonal trends"), but the base product is just schema parsing + smart type inference.

The difference from Faker: you don't write any code. Paste your CREATE TABLE, get data back. Faker is a library you have to integrate, configure field-by-field, and maintain as your schema changes. Different use case — more like "I need a seeded database in 30 seconds" vs "I'm building a test suite."

Fair point on pricing though, still figuring that out. Appreciate the feedback.

Omnipresent•1mo ago
how is this compared to https://shadowtraffic.io/
tbrannan•1mo ago
Different focus, ShadowTraffic is config-driven and optimized for streaming/Kafka workloads. We're schema-driven: point us at your DDL and we generate relational test data automatically. Less config, more just give me test data that fits my tables.
james_marks•1mo ago
Congrats on being launchable!

I've written seed data scripts a number of times, so I get the need. How do you think about creating larger amounts of data?

E.g., I'm building a statistical product where the seed data needs to be 1M rows; performance differences between implementations start to matter.

tbrannan•1mo ago
Thanks! At 1M rows, I think a few things matter:

Streaming: Can't hold it all in memory. Generate in chunks, write, release, repeat.

Format choice: Parquet with row groups is fast and compresses well. SQL needs batched inserts (~1000/statement). Direct DB writes via COPY skip serialization entirely is usually fastest.

FK relationships: The real bottleneck. Pre-generate parent PKs, hold in memory, reference for children. Gets tricky with complex graphs at scale.

Parallelization: Row generation is embarrassingly parallel, but writes are serial. Chunk-then-merge is on our radar but not shipped yet.

What does your stat product need, realistic distributions or pure volume/stress testing?

rrr_oh_man•1mo ago
Why does this read like AI slop?
tbrannan•1mo ago
because it is, but its still true lol
fcoury•1mo ago
A while ago I worked on a similar idea, it was back when I was learning Rust so not super proud of the code, but I love the name of the tool: https://github.com/gistia/joindoe
bdcravens•1mo ago
I appreciate this product existing, but the row limits in each tier seem very constrained.
tbrannan•1mo ago
Thanks for the feedback! Honestly, we're still dialing in the tiers, what row limits would feel reasonable to you for your use case? Always helpful to hear what people actually need.
bdcravens•1mo ago
To be honest, I think 1M rows is starting point for any paid plan. Any data model of minimal complexity explodes fast, especially with cascading one-to-many relationships. If anything, it may make more sense to have a table-level, rather than a global, limit. Or put the limit on "trunk" tables.
tbrannan•1mo ago
That is a really good point one-to-many relationships blow up fast. The trunk table idea is interesting, would simplify how people reason about limits. Appreciate the feedback, genuinely helpful!
ljm•1mo ago
Reminds me a bit of Snaplet before it embarked on its incredible journey to get acquired by Supabase and shut down.

I like the concept but the painpoint has never been around creating realistic looking emails and such like, but creating data that is realistic in terms of the business domain and in terms of volume.

tbrannan•1mo ago
Appreciate the Snaplet comparison, they were doing good work. You're right that realistic looking strings are the easy part. We're focused on relational integrity first (FKs, constraints, realistic cardinality), but business domain logic is the next layer. What kinds of rules would be useful for you? Things like weighted distributions, time-based patterns, conditional relationships?
ljm•1mo ago
The realistic cardinality is actually a good start (the problem with things like using Faker for DB seeds being that everything is entirely too random).

If one were be able to use metrics as source then, depending on the quality of the metrics, it might be possible to distribute data in a manner similar to what's observed in production? You know, some users that are far more active than others, for example. Considering a major issue with testing is that you can't accurately benchmark changes or migrations based on a staging environment that is 1% the size of your prod one, that would be a huge win I think even if the data is, for the most part, nonsensical. As long as referential integrity is intact the specifics matter less.

Domain specific stuff is harder to describe I think. For example, in my setup I'd want seeds of valid train journeys over multiple legs. There's a lot of detail in that where the shortcut is basically to try and source it from prod in some way.

tbrannan•1mo ago
This is useful. What if you ran a CLI locally that extracts just the statistical profile from prod cardinality, relationship ratios, etc. and uploaded that? We'd never touch your database, you just hand us the metrics and we match the shape.
ljm•1mo ago
I'd be willing to try that out :) a CLI would be great, even as a sandbox tool
tbrannan•1mo ago
Really appreciate the input. I'll make sure to give you early access once we implement this, I'll keep you posted.
metadata•4w ago
We do exactly that in one of our products. It's called data profiling.
pistoriusp•1mo ago
Hey! Snaplet founder here. Want to clarify that it was not acquired by Supabase; I shutdown the startup and found roles for some of the team at Supabase.

The code remains:

- https://github.com/supabase-community/seed - https://github.com/supabase-community/copycat - https://github.com/supabase-community/snapshot

This looks like a great project, wishing them all the best on the journey.

tbrannan•1mo ago
Thanks!! means a lot coming from you. Best of luck at Supabase.
pistoriusp•1mo ago
Thanks, but I am not at Supabase! I ended up going back to building RedwoodJS and took over the project, and now have a consultancy.
dmarwicke•1mo ago
does it handle skewed distributions? faker's always been useless for this - like, your test data ends up with everyone having 5 orders when real data is all long tail
tbrannan•1mo ago
Not yet, but you're the second person in this thread to call out distribution control as a gap. It's on our radar now. Thanks for the feedback.
NDizzle•1mo ago
SQL Server support would certainly help you sell enterprise plans.
tbrannan•1mo ago
Noted, we've been focused on Postgres first but SQL Server keeps coming up. Appreciate the feedback.
Antitoxic6185•1mo ago
I have something that gives you the data in CSV/SQL insert statements.

I also provide an option to select how to generate data for specific fields.

https://fakemydb.alles-tools.com

UI is a bit clunky - will revamp it :)

tbrannan•1mo ago
Great minds think alike!
gerardnico•1mo ago
Real Test data génération as saas was not a viable business for us.

Developers use their tool or develop a script (with ai or not)

We made it free, the value comes when you can use it in your development process.

https://www.tabulify.com/learning-tabulify-step-9-how-to-fil...

The cost of calling a service is also not free.

In all case, all the best in your endeavour.

freakynit•4w ago
Vibe-coded one using LLM (only for semantic understanding of columns). Works well: https://github.com/freakynit/postgre-data-generator
mikhSh•3w ago
hey tbrannan, nice tool! i was actually comparing a few of these recently for the staging DB and ended up trying https://seedfa.st/ since i needed something that works in CI pipelines(we use github actions). the main difference i actually noticed is that ddl-to-data is web-based and seedfast is os-native(because its actually a cli tool first) and integrates with CI/CD pipelines smoothly. both handle FK constraints fine, but tbh i prefer not having to copy-paste schemas into a web UI every time. though your pricing looks much more straightforward for small projects which i like more