frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Pontoon – Open-source customer data syncs

https://github.com/pontoon-data/Pontoon
36•alexdriedger•14h ago
Hi HN,

We’re Alex and Kalan, the creators of Pontoon (https://github.com/pontoon-data/Pontoon). Pontoon is an open-source data export platform that makes it really easy to create data syncs and send data to your enterprise customers. Check out our demo here: https://app.storylane.io/share/onova7c23ai6 or try it out with docker: https://pontoon-data.github.io/Pontoon/getting-started/quick...

While at our prior roles as data engineers, we’ve both felt the pain of data APIs. We either had to spend weeks building out data pipelines in house or spend a lot on ETL tools like Fivetran (https://www.fivetran.com/). However, there were a few companies that offered data syncs that would sync directly to our data warehouse (eg. Redshift, Snowflake, etc.), and when that was an option, we always chose it. This led us to wonder “Why don’t more companies offer data syncs?”. It turns out, building reliable cross-cloud data syncs is difficult. That’s why we built Pontoon.

We designed Pontoon to be:

- Easily deployed: we provide a single, self-contained Docker image for easy deployment and Docker Compose for larger workloads (https://pontoon-data.github.io/Pontoon/getting-started/quick...)

- Support modern data warehouses: we support syncing to/from Snowflake, BigQuery, Redshift, and Postgres.

- Sync cross cloud: sync from BigQuery to Redshift, Snowflake to BigQuery, Postgres to Redshift, etc.

- Developer friendly: data syncs can also be built via the API

- Open source: Pontoon is free to use by anyone

Under the hood, we use Apache Arrow (https://arrow.apache.org/) to move data between sources and destinations. Arrow is very performant - we wanted to use a library that could handle the scale of moving millions of records per minute.

In the shorter-term, there are several improvements we want to make, like:

- Adding support for DBT models to make adding data models easier

- UX improvements like better error messaging and monitoring of data syncs

- More sources and destinations (S3, GCS, Databricks, etc.)

- Improve the API for a more developer friendly experience (it’s currently tied pretty closely to the front end)

In the longer-term, we want to make data sharing as easy as possible. As data engineers, we sometimes felt like second class citizens with how we were told to get the data we needed - “just loop through this api 1000 times”, “you probably won’t get rate limited” (we did), “we can schedule an email to send you a csv every day”. We want to change how modern data sharing is done and make it simple for everyone.

Give it a try https://github.com/pontoon-data/Pontoon. Cheers!

Comments

melson•13h ago
Is it like an offline sync?
kalanm•13h ago
Kalan here, syncs are batch based and scheduled, similar to conventional ETL / data pipelines
conormccarter•13h ago
Congrats on the launch! I'm one of the cofounders of Prequel (I saw our name in the feature grid - small nit: we do support self-hosting). This is definitely a problem worth solving - the market is still early and I'd bet the rising tide will help all of us convince more teams to support this capability. I'm not a lawyer, but the latest EU Data Act might even make it an obligation for some software vendors?

Maybe I can save you a headache: Snowflake is actively deprecating single-factor username/password auth in favor of key pair auth, so the faster you support that, the fewer mandatory migrations you'll be emailing users about.

kalanm•13h ago
Thanks! Kalan here, I appreciate the nit! PR is already merged. Definitely agreed on the market, it seems like there's a ton of opportunity. And thanks for the heads up re Snowflake auth! we're actively working that one, and a few other auth modes for Redshift and BQ as well.
hiatus•12h ago
What does the row "First-class Data Products" in the comparison table entail?
alexdriedger•10h ago
Great question. We think of data products as multi-tenant tables that are created with the intention of sending that data to a customer.

To compare with an ETL tool like Airbyte, it's really easy to sync a full table somewhere with Airbyte, but it get's more complicated if you have a multi-tenant table, where you want to sync only a subset of data to a customer.

When you're setting up a data model with Pontoon, you just define which column has the customer id (we call it a tenant id) and it handles sending the right data to the right customer.

a2128•11h ago
Not to be confused with Pontoon, a self-hostable translation platform made by Mozilla: https://github.com/mozilla/pontoon
alexdriedger•10h ago
Another great self-hostable platform. I'm not sure where they got their name from though, translations don't have a connection to lakes like data does...
mdaniel•3h ago
> Open source: Pontoon is free to use by anyone

And yet, the top level license file contains two licenses, a goddamn if statement, and finally a default to Elastic License which is not open source

Then, because if this trickery, lawyers get to make good money because does "2. LICENSE file in the same directory as the work" combined with "4. Defaults to Elastic License 2.0 (ELv2)" mean that this zero byte file NAMED LICENSE but devoid of any content match clause 2 or 4?

https://github.com/pontoon-data/Pontoon/blob/v0.2.0/data-tra...

I hate cutesy licenses with all my heart. Just say you want to use Elastic, drop the Open Source pretense, and go back to selling software instead of trying to position yourself as "open"

No educated person is going to give you free commits in the current state so no need to be opaque in hopes of tricking them

Hardening mode for the compiler

https://discourse.llvm.org/t/rfc-hardening-mode-for-the-compiler/87660
65•vitaut•3h ago•4 comments

Cerebras Code

https://www.cerebras.ai/blog/introducing-cerebras-code
258•d3vr•7h ago•105 comments

Robert Wilson has died

https://www.theartnewspaper.com/2025/08/01/robert-wilson-playwright-director-artist-obituary
34•paulpauper•3h ago•8 comments

Coffeematic PC – A coffee maker computer that pumps hot coffee to the CPU

https://www.dougmacdowell.com/coffeematic-pc.html
152•dougdude3339•8h ago•36 comments

Weather Model based on ADS-B

https://obrhubr.org/adsb-weather-model
127•surprisetalk•2d ago•20 comments

JavaScript retro sound effects generator

https://github.grumdrig.com/jsfxr/
38•selvan•3d ago•8 comments

The Rickover Corpus: A digital archive of Admiral Rickover's speeches and memos

https://rickovercorpus.org/
42•stmw•5h ago•9 comments

At 17, Hannah Cairo solved a major math mystery

https://www.quantamagazine.org/at-17-hannah-cairo-solved-a-major-math-mystery-20250801/
275•baruchel•13h ago•127 comments

Ethersync: Peer-to-peer collaborative editing of local text files

https://github.com/ethersync/ethersync
95•blinry•3d ago•10 comments

I couldn't submit a PR, so I got hired and fixed it myself

https://www.skeptrune.com/posts/doing-the-little-things/
219•skeptrune•13h ago•131 comments

Ask HN: Who is hiring? (August 2025)

171•whoishiring•15h ago•199 comments

Native Sparse Attention

https://aclanthology.org/2025.acl-long.1126/
102•CalmStorm•10h ago•12 comments

Does the Bitter Lesson Have Limits?

https://www.dbreunig.com/2025/08/01/does-the-bitter-lesson-have-limits.html
116•dbreunig•9h ago•62 comments

The tradeoff between human and AI context

https://softwaredoug.com/blog/2025/07/30/layers-of-ai-coding
15•softwaredoug•2d ago•0 comments

Researchers map where solar energy delivers the biggest climate payoff

https://www.rutgers.edu/news/researchers-map-where-solar-energy-delivers-biggest-climate-payoff
78•rbanffy•9h ago•42 comments

Anthropic revokes OpenAI's access to Claude

https://www.wired.com/story/anthropic-revokes-openais-access-to-claude/
173•minimaxir•8h ago•55 comments

Yearly Organiser

https://neatnik.net/calendar/
7•anewhnaccount2•3d ago•1 comments

Launch HN: Societies.io (YC W25) – AI simulations of your target audience

86•p-sharpe•17h ago•47 comments

Show HN: Draw a fish and watch it swim with the others

https://drawafish.com
823•hallak•4d ago•212 comments

Self-Signed JWTs

https://www.selfref.com/self-signed-jwts
97•danscan•11h ago•57 comments

Show HN: Print the daily weather forecast on a thermal receipt printer

https://github.com/chr15m/print-weather
10•chr15m•2d ago•4 comments

Twentyseven 1.0

https://blog.poisson.chat/posts/2025-08-01-twentyseven.html
30•082349872349872•7h ago•3 comments

Ask HN: Who wants to be hired? (August 2025)

76•whoishiring•15h ago•182 comments

Ergonomic keyboarding with the Svalboard: a half-year retrospective

https://twey.io/hci/svalboard/
93•Twey•13h ago•46 comments

Replacing tmux in my dev workflow

https://bower.sh/you-might-not-need-tmux
249•elashri•20h ago•280 comments

Google shifts goo.gl policy: Inactive links deactivated, active links preserved

https://blog.google/technology/developers/googl-link-shortening-update/
211•shuuji3•12h ago•156 comments

Make Your Own Backup System – Part 2: Forging the FreeBSD Backup Stronghold

https://it-notes.dragas.net/2025/07/29/make-your-own-backup-system-part-2-forging-the-freebsd-backup-stronghold/
97•todsacerdoti•3d ago•3 comments

Peak Energy just shipped the US's first grid-scale sodium-ion battery

https://electrek.co/2025/07/30/peak-energy-us-first-grid-scale-sodium-ion-battery/
52•breve•3h ago•8 comments

Show HN: TraceRoot – Open-source agentic debugging for distributed services

https://github.com/traceroot-ai/traceroot
33•xinweihe•13h ago•8 comments

Deep Agents

https://blog.langchain.com/deep-agents/
114•saikatsg•10h ago•36 comments