frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

NotebookLM: The AI that only learns from you

https://byandrev.dev/en/blog/what-is-notebooklm
1•byandrev•28s ago•1 comments

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

https://github.com/ClickHouse/postgres-clickhouse-stack
1•saisrirampur•1m ago•0 comments

Game Boy Advance d-pad capacitor measurements

https://gekkio.fi/blog/2026/game-boy-advance-d-pad-capacitor-measurements/
1•todsacerdoti•1m ago•0 comments

South Korean crypto firm accidentally sends $44B in bitcoins to users

https://www.reuters.com/world/asia-pacific/crypto-firm-accidentally-sends-44-billion-bitcoins-use...
1•layer8•2m ago•0 comments

Apache Poison Fountain

https://gist.github.com/jwakely/a511a5cab5eb36d088ecd1659fcee1d5
1•atomic128•4m ago•0 comments

Web.whatsapp.com appears to be having issues syncing and sending messages

http://web.whatsapp.com
1•sabujp•4m ago•1 comments

Google in Your Terminal

https://gogcli.sh/
1•johlo•5m ago•0 comments

Shannon: Claude Code for Pen Testing

https://github.com/KeygraphHQ/shannon
1•hendler•6m ago•0 comments

Anthropic: Latest Claude model finds more than 500 vulnerabilities

https://www.scworld.com/news/anthropic-latest-claude-model-finds-more-than-500-vulnerabilities
1•Bender•10m ago•0 comments

Brooklyn cemetery plans human composting option, stirring interest and debate

https://www.cbsnews.com/newyork/news/brooklyn-green-wood-cemetery-human-composting/
1•geox•10m ago•0 comments

Why the 'Strivers' Are Right

https://greyenlightenment.com/2026/02/03/the-strivers-were-right-all-along/
1•paulpauper•12m ago•0 comments

Brain Dumps as a Literary Form

https://davegriffith.substack.com/p/brain-dumps-as-a-literary-form
1•gmays•12m ago•0 comments

Agentic Coding and the Problem of Oracles

https://epkconsulting.substack.com/p/agentic-coding-and-the-problem-of
1•qingsworkshop•13m ago•0 comments

Malicious packages for dYdX cryptocurrency exchange empties user wallets

https://arstechnica.com/security/2026/02/malicious-packages-for-dydx-cryptocurrency-exchange-empt...
1•Bender•13m ago•0 comments

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent
1•shubham-coder•13m ago•0 comments

Penisgate erupts at Olympics; scandal exposes risks of bulking your bulge

https://arstechnica.com/health/2026/02/penisgate-erupts-at-olympics-scandal-exposes-risks-of-bulk...
4•Bender•14m ago•0 comments

Arcan Explained: A browser for different webs

https://arcan-fe.com/2026/01/26/arcan-explained-a-browser-for-different-webs/
1•fanf2•16m ago•0 comments

What did we learn from the AI Village in 2025?

https://theaidigest.org/village/blog/what-we-learned-2025
1•mrkO99•16m ago•0 comments

An open replacement for the IBM 3174 Establishment Controller

https://github.com/lowobservable/oec
1•bri3d•18m ago•0 comments

The P in PGP isn't for pain: encrypting emails in the browser

https://ckardaris.github.io/blog/2026/02/07/encrypted-email.html
2•ckardaris•21m ago•0 comments

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

https://github.com/fokdelafons/lustra
1•fokdelafons•21m ago•1 comments

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

1•Chance-Device•23m ago•0 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
1•ColinWright•25m ago•0 comments

Jim Fan calls pixels the ultimate motor controller

https://robotsandstartups.substack.com/p/humanoids-platform-urdf-kitchen-nvidias
1•robotlaunch•29m ago•0 comments

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
1•HotGarbage•29m ago•0 comments

AI UX Playground: Real-world examples of AI interaction design

https://www.aiuxplayground.com/
1•javiercr•30m ago•0 comments

The Field Guide to Design Futures

https://designfutures.guide/
1•andyjohnson0•30m ago•0 comments

The Other Leverage in Software and AI

https://tomtunguz.com/the-other-leverage-in-software-and-ai/
1•gmays•32m ago•0 comments

AUR malware scanner written in Rust

https://github.com/Sohimaster/traur
3•sohimaster•34m ago•1 comments

Free FFmpeg API [video]

https://www.youtube.com/watch?v=6RAuSVa4MLI
3•harshalone•34m ago•1 comments
Open in hackernews

Show HN: I built an AI dataset generator

https://github.com/metabase/dataset-generator
169•matthewhefferon•7mo ago

Comments

matthewhefferon•7mo ago
I was tired of digging through Kaggle and writing prompts over and over just to get fake data for dashboards and demos. So I built a little tool to help me out.

It uses GPT-4o to generate a detailed schema and business rules based on a few dropdowns (like business type, schema structure, and row count). Then Faker fills in the rows using those rules, which keeps it fast and cheap.

You can preview the data, export as CSV or SQL, or spin up Metabase with one click to explore the data. It’s open-source, still in early stages, but wanted to share, get feedback and see how you'd improve it.

thenaturalist•7mo ago
Congrats, thanks for shipping and open sourcing this!

Cool to see Metabase is enabling contributions to the ecosystem this way! :)

matthewhefferon•7mo ago
No problem, thanks for taking a look!
margotli•7mo ago
Feels like a useful tool for anyone learning analytics or just needing sample data to test with.
hiatus•7mo ago
Are you affiliated with metabase? https://news.ycombinator.com/item?id=44107584
mritchie712•7mo ago
I use this prompt to spin up demos for customers at https://www.definite.app/:

    @Web Do some research on https://somecompany.com and write up a detailed overview of what the company does. What might their database schema look like?

    I need you to build a mock database for them in duckdb for a demo

Then:

    Create a uv project and write a python script to add demo data. Use Faker.

    @Web research how many customers they have. Make the database to appropriate scale.

Only takes a few minutes in Cursor, should work just as well in Claude Code. It works really well for the companies core business, but I still need to create one to populate 3rd party sources (e.g. Stripe, Salesforce, Hubspot, etc.).
matthewhefferon•7mo ago
Cool, I don’t do customer-specific demos, but I like this idea. I might add this use case as an option. Thanks for sharing!
b0a04gl•7mo ago
seen this pattern a before too. faker holds shape without flow. real tables come from actions : retry, decline, manual review, all that. you just set col types, you might miss why the row even happened. gen needs to simulate behavior, not format
matthewhefferon•7mo ago
That’s a solid callout, appreciate you pointing it out. I’ll definitely dig into that more.
ajd555•7mo ago
Was looking for this exact comment. I completely agree with this method, especially if you're testing an entire flow, and not just a UI tool. You want to test the service that interfaces between the API and the dabatase.

I've been writing custom simulation agents (just simple go programs) that simulate different users of my system. I can scale appropriately and see test data flow in. If metabase could generate these simulation agents based on a schema and some instructions, now that would be quite neat! Good job on this first version of the tool, though!

tomrod•7mo ago
The best synthetic data are those that capture ingestion and action, instead of just relationship.

Relationship is important, but your data structure might capture a virtually infinite number of unexpected behaviors that you would preferably call errors or bugs.

zikani_03•7mo ago
This is well put. I once built a tool called [zefaker] (github.com/creditdatamw/zefaker) to test some data pipelines but never managed to get a good pattern or method for generating data that simulates actions or scenarios that didn't involve too much extra work.

Was hoping this AI dataset generator solves that issue, but i guess it is still early days. Looks good though and using Faker to generate the data locally sounds good as a cost-cutting measure, but also potentially opens room for human-in-the-loop adjustments of the generated data.

jasonthorsness•7mo ago
AI is really good at this sort of thing; I've been using an LLM with Faker for some time to load data for demos into SingleStore: https://github.com/jasonthorsness/loadit
matthewhefferon•7mo ago
Nice, I like the challenge video!
jasonthorsness•7mo ago
Ha thanks, appreciate that, I regret the video a little as I was going through a short "a more exciting blog with videos is what the people want" phase.
paxys•7mo ago
Feature request - make the URL for the OpenAI API configurable. That way one can swap it out with Anthropic or any other LLM provider of their choice that provides an OpenAI-compatible API.
matthewhefferon•7mo ago
I was actually thinking about this very feature in the shower this morning :)
wiradikusuma•7mo ago
"Stack: OpenAI API (GPT-4o for data generation)" -- I wonder if someday we'll have a generic API like how it's done in Java (e.g., Servlet API implemented by Tomcat, JBoss etc), so everyone can use their favorite LLM instead of having to register each provider like streaming services e.g. Disney+, Netflix, etc.
matthewhefferon•7mo ago
I hope so. I'm already subscribed to every streaming service, and my wallet can't handle all these LLMs too.
zild3d•7mo ago
isn't this essentially https://openrouter.ai/
MattSayar•7mo ago
I used Anthropic's new Claude API integration with artifacts to make a probably-worse version that you can play with (after logging in of course).

https://claude.ai/public/artifacts/eb7d8256-6d21-4c85-af9b-c...

I used this GitHub repo as context and Claude Opus 4 to create this artifact

NitpickLawyer•7mo ago
Haha, I find this kind of exercise telling for what's coming to the one-size-fits-all SaaS companies out there. I see a future where small teams can in-house the set of features they actually need, and a big drop in SaaS usage. Avoids the big vendor lock-in problems, unwanted features and bypasses all the accenture-style consulting fees.
MattSayar•7mo ago
Optimistically, this will allow smaller teams to do more, hopefully incentivizing the consulting places to help out with harder problems.
jmsdnns•7mo ago
depending on what you're using the synthetic data for, it is sometimes called distillation. here is a robust example from some upenn students: https://datadreamer.dev/
reedlaw•7mo ago
"Dataset" connotes training data, but this seems to generate sample data, maybe for testing an application. Is there any use for synthetic datasets in ML?
dankwizard•7mo ago
words can have multiple meanings <:- )
DiscourseFan•7mo ago
They could.
Mamawho•7mo ago
Yes, check out Synthbyte.ai, we make training data and have with all sorts of datasets, including NIH data
smcleod•7mo ago
This is a bit confusing, I sort of expected it to be a bit like Kiln https://github.com/Kiln-AI/Kiln to generate datasets for AI, but it looks like the outputs are more just data / files than datasets?
ajar8087•7mo ago
I was thinking more synthetic data to fit models like https://whitelightning.ai/
ChrisMarshallNY•7mo ago
I wrote a Swift CLI app to generate dummy user profiles for an app we wrote (I needed many more than we’ll actually get, and I needed screenshots for the App Store that didn’t have real user data).

It was pretty “dumb,” and used thispersondoesnotexist.com for profile pics.

klntsky•7mo ago
You absolutely do not need docker as a requirement here
alienbaby•7mo ago
Good for the shape of data, but what about the actual data? If it's entirely random then it's more of a UI demo tool than a tool to generate useful data.