frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Misata – synthetic data engine using LLM and Vectorized NumPy

https://github.com/rasinmuhammed/misata
24•rasinmuhammed•1mo ago
Hey HN, I’m the author.

I built Misata because existing tools (Faker, Mimesis) are great for random rows but terrible for relational or temporal integrity. I needed to generate data for a dashboard where "Timesheets" must happen after "Project Start Date," and I wanted to define these rules via natural language.

How it works: LLM Layer: Uses Groq/Llama-3.3 to parse a "story" into a JSON schema constraint config.

Simulation Layer: Uses Vectorized NumPy (no loops) to generate data. It builds a DAG of tables to ensure parent rows exist before child rows (referential integrity).

Performance: Generates ~250k rows/sec on my M1 Air.

It’s early alpha. The "Graph Reverse Engineering" (describe a chart -> get data) is experimental but working for simple curves.

pip install misata

I’d love feedback on the simulator.py architecture—I’m currently keeping data in-memory (Pandas) which hits a ceiling at ~10M rows. Thinking of moving to DuckDB for out-of-core generation next. Thoughts?

Comments

twelvechess•1mo ago
That would be useful for testing MVPs with dummy data to see if they work. However, synthetic data is usually used when you derive new data from existing data, so the new data is called synthetic. From the README I didn't quite catch if that is the case here, but still useful.
OutOfHere•1mo ago
Is it possible to incrementally update the schema? I may like to develop it over say ten iterations of incremental points that I missed. After each iteration, I want examine the schema, and say what I want changed.

OpenClaw Is Changing My Life

https://reorx.com/blog/openclaw-is-changing-my-life/
11•novoreorx•1h ago•15 comments

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

https://github.com/localgpt-app/localgpt
166•yi_wang•6h ago•55 comments

Haskell for all: Beyond agentic coding

https://haskellforall.com/2026/02/beyond-agentic-coding
81•RebelPotato•5h ago•20 comments

SectorC: A C Compiler in 512 bytes (2023)

https://xorvoid.com/sectorc.html
272•valyala•13h ago•52 comments

Total surface area required to fuel the world with solar (2009)

https://landartgenerator.org/blagi/archives/127
33•robtherobber•4d ago•39 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
211•mellosouls•16h ago•360 comments

LLMs as the new high level language

https://federicopereiro.com/llm-high/
79•swah•4d ago•146 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
172•surprisetalk•13h ago•170 comments

LineageOS 23.2

https://lineageos.org/Changelog-31/
13•pentagrama•1h ago•0 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
184•AlexeyBrin•19h ago•35 comments

Brookhaven Lab's RHIC concludes 25-year run with final collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
76•gnufx•12h ago•60 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
177•vinhnx•16h ago•18 comments

The Architecture of Open Source Applications (Volume 1) Berkeley DB

https://aosabook.org/en/v1/bdb.html
11•grep_it•5d ago•0 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
336•jesperordrup•23h ago•101 comments

First Proof

https://arxiv.org/abs/2602.05192
139•samasblack•16h ago•81 comments

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

https://solar.lowtechmagazine.com/2010/01/wood-gas-vehicles-firewood-in-the-fuel-tank/
37•Rygian•2d ago•12 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
87•momciloo•13h ago•18 comments

Substack confirms data breach affects users’ email addresses and phone numbers

https://techcrunch.com/2026/02/05/substack-confirms-data-breach-affecting-email-addresses-and-pho...
32•witnessme•2h ago•9 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
83•chwtutha•4h ago•22 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
109•thelok•15h ago•24 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
593•theblazehen•3d ago•215 comments

Show HN: A luma dependent chroma compression algorithm (image compression)

https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-...
42•mbitsnbites•3d ago•6 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
317•1vuio0pswjnm7•20h ago•522 comments

FDA intends to take action against non-FDA-approved GLP-1 drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
117•randycupertino•9h ago•245 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
164•speckx•4d ago•246 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
909•klaussilveira•1d ago•277 comments

Selection rather than prediction

https://voratiq.com/blog/selection-rather-than-prediction/
36•languid-photic•4d ago•18 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
305•isitcontent•1d ago•39 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
149•videotopia•4d ago•49 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
314•dmpetrov•1d ago•159 comments