Show HN: GraFlo - Universal ETL tool for property KG (Neo4j, TigerGraph, Arango)

5•acrostoic•1h ago

We built GraFlo after repeatedly writing the same boilerplate code to transform datasets into Neo4j, ArangoDB, and TigerGraph. Every time we had a new dataset (OpenAlex, IBES financial data, Debian packages), we'd write yet another custom ETL script with the same problems: ID generation, type coercion, deduplication, and database-specific quirks.

GraFlo is a declarative framework that handles this once and for all. You define your graph structure in a database-agnostic schema - vertices, edges, properties, and how they map to your source data (CSV, SQL, JSON, XML). GraFlo then generates the ingestion code for your target database. The key insight: while Neo4j, ArangoDB, and TigerGraph are all idiosyncratic, the underlying property graph model is universal. We crystallized that into a single abstraction layer.

What GraFlo handles automatically:

- Consistent ID generation across vertices and edges

- Type coercion (strings to dates, numbers, etc.)

- Vertex and edge deduplication

- Generating database-specific ingestion scripts

It's plug-and-play in the sense that swapping from Neo4j to ArangoDB takes no time — just change the target database type in your config (docker compose examples provided).

We've used it to build knowledge graphs from academic publications, financial datasets, and package dependencies. Instead of maintaining N × M scripts (N datasets, M databases), we maintain N schemas.

On the roadmap: SQL/API integration (e.g., automatically generating GraFlo configs from SQL schemas).

Would love feedback from anyone working with graph databases or building knowledge graphs.

Comments

x0xa•1h ago

Are there any tools for data migration when swapping database engines? Thanks.

acrostoic•1h ago

Not directly - GraFlo is for the ingestion side, not migration. Migration between different property graph DBs isn't trivial (and sometimes not even possible) because they're organized in fundamentally different ways. Some are much more flexible with uniqueness constraints, indexes, or how they handle certain graph patterns.

But the nice thing is: if you have your source data and GraFlo schema, regenerating your graph in a different DB is trivial. GraFlo handles indexes and constraints for each target database. It's like having the recipe instead of trying to reverse-engineer the cake.

syats•11m ago

Cool! These are indeed very common graph-building steps.

Thinking outloud here, but some of these were supposed to be solved with RML (https://rml.io/) for the RDF paradigm. I witnessed a bit of their evolution: it started with similar operations as GraFlo and eventually they built some support for arbitrary java code. For example, say you want your node ID to be generated by concatenating the values of the firstName column and the lastName column, but only after some weird string normalization (think of making sure everything is utf8)... you woundn't want to make your schema-mappings Turing-complete, so you'd eventually have to allow for calling other functions. Any way, all of that was for RDF graphs, it's cool to see something like this for property graphs.

Instantly add self-contained services, workers, and libraries in TypeScript

FBI Tries to Unmask Owner of Infamous Archive.is Site

Show HN: SQL_transform_framework – Lightweight SQL framework for data transforms

Aperture case rethinks digital wellbeing

Paykit SDK, One API for Stripe, PayPal, GoPay, Paddle, etc.

OpenAI probably can't make ends meet. That's where you come in

The Power of Limit Thinking

Understanding police attitudes to fraud and the barriers to prioritisation

The password to gain access to the Louvre is Louvre

Creating a Gridogram: Hiding a Sentence in a Compact Grid of Letters

Generic MCP for OpenAPI Compatible APIs

The 40-year economic mistake that let Google conquer (and enshittify) the world

Cengage to close 3 offices, transition 4k employees to remote work

CodeWeavers Launches CrossOver Preview for Linux ARM64

Consensus on Android System Safety Core?

IBM PC Invoice

New therapeutic brain implants defy the need for surgery

What Happens to the Weavers? Lessons for AI from the Industrial Revolution

OpenAI API > Anthropic API

Dark Energy Double-Take: The Universe May Not Be Accelerating After All

Google's proposed Android changes won't save sideloading

Daily Discord Notifications in Three N8n Nodes (No XPath Knowledge Required)

Show HN: Floxtop – Clean up your Mac with private, offline AI file organization

Apple TV's colorful new branding was built with glass and captured in camera

Rare Cars Hidden by Collectors

Epanet-JS October 2025 Progress Report

The Soothing British Radio Show That Blew My Mind – and Put Me Right to Sleep

Why Palantir's success will outlast AI exuberance

How a Nix flake made our polyglot stack (and new dev onboarding) fast and sane

Postgres Is Enough