frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: CocoIndex – Open-Source Real-time Data transformation framework

https://github.com/cocoindex-io/cocoindex
2•badmonster•1d ago
Hi HN,

I’ve been working on CocoIndex, an open-source ultra-performant framework to transform data for AI. It is optimized for data freshness, with incremental processing out-of-box.

You can start a CocoIndex with `pip install cocoindex` and declare a data flow that can build data transformation like LEGO - vector embeddings - knowledge graphs, - or extract, transform data with LLMs

It is a data processing framework beyond SQL. When you run the data flow either with live mode or batch mode, it will process the data incrementally with minimal recomputation and make it super fast to update the target stores on source changes.

The core engine is written in Rust. I've been a big fan of Rust before I left my last job. It is my first choice on the open source project for the data framework because of 1) robustness 2) performance 3) ability to bind to different languages.

I’ve made a few tutorials and new projects since last launch, with different use cases: - https://www.youtube.com/@cocoindex-io - https://cocoindex.io/blogs/tags/examples

Previously, I’ve worked at Google on projects like search indexing and ETL infra for 8 years. After I left Google last year, I built various projects and went through pivoting hell.

In all the projects I’ve built, data still sits in the center of the problem and I find myself focusing on building data infra other than the business logic I need for data transformation. The current prepackaged RAG-as-service doesn't serve my needs, because I need to choose a different strategy for the context, and I also need deduplication, clustering (items are related), and other custom features that are commonly needed. That’s where CocoIndex starts.

A simple philosophy behind it - data transformation is similar to formulas in spreadsheets. The ground of truth is at the source data, and all the steps to transform, and final target store are derived data, and should be reactive based on the source change. If you use CocoIndex, you only need to worry about defining transformations like formulas.

Data flow paradigm came in as an immediate choice. because there’s no side effect, lineage and observability just come out of the box.

Incremental processing - If you are a data expert, an analogy would be a materialized view beyond SQL. The framework tracks pipeline states in database (Postgres) and only re-processes necessary portions. When data has changed, the framework handles the change data capture comprehensively and combines the mechanism for push and pull. Then clear stale derived data/versions and re-index data based on tracking data/logic changes or data TTL settings. There’s lots of edge cases to do it right, for example, when a row is referenced in other places, and the row changes. These should be handled at the level of the framework.

At the compute engine level - the framework should consider the multiple processes and concurrent updates. It should consider how to resume existing states from terminated execution. In the end, we want to build a framework that is easy to build with exceptional velocity, but scalable and robust in production.

Standardized the interface throughout the data flow - really easy to plugin custom logic like LEGO; with a variety of native built-in components. One example is that it takes a few lines to switch among Qdrant, Postgres, Neo4j.

CocoIndex is licensed under Apache 2.0 https://github.com/cocoindex-io/cocoindex Getting started: https://cocoindex.io/docs/getting_started/quickstart

I have rolled out over 25 releases since last HN launch and it has significantly improved in all aspects, especially supporting property graphs (Neo4j, Kuzu), supporting queue based CDC (AWS S3, SQS) and lots of infra updates including CLI, resilience and error handlings.

Excited to learn your thoughts, and thank you so much!

Linghua

Tesla share plunge amid Trump feud wipes $152B off Elon Musk's company

https://www.theguardian.com/technology/2025/jun/05/tesla-share-drop-trump-musk-feud
1•beardyw•48s ago•0 comments

Australian Navy ship accidentally blocks WiFi across parts of New Zealand

https://www.theguardian.com/australia-news/2025/jun/06/australian-navy-ship-accidentally-blocks-wifi-across-parts-of-new-zealand
1•defrost•7m ago•0 comments

OpenBSD Hackathon Japan 2025

https://rsadowski.de/posts/2025/j2k25-japan-openbsd-hackathon/
1•damir•7m ago•0 comments

MLX-based LLM inference engine for macOS with native Swift implementation

https://github.com/Trans-N-ai/swama
1•jovezhong•12m ago•1 comments

Second ispace craft has probably crash-landed on Moon

https://www.nature.com/articles/d41586-025-01751-3
1•politelemon•14m ago•1 comments

The Automaker Wars No One Talks About

https://www.carsandhorsepower.com/featured/the-automaker-wars-no-one-talks-about-niche-competitions-in-weird-segments
1•Anumbia•15m ago•0 comments

How Anthropic teams use Claude Code [pdf]

https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135ad8871e7658.pdf
1•ChrisArchitect•18m ago•0 comments

I Learned Rust in 24 Hours to Eat Free Pizza Morally

https://medium.com/@sebastiancarlos/i-learned-rust-in-24-hours-to-eat-free-pizza-morally-28ea8312e523
1•todsacerdoti•18m ago•0 comments

OpenAI CEO Sam Altman says AI is ready for entry-level jobs

https://fortune.com/2025/06/05/openai-ceo-sam-altman-ai-as-good-as-interns-entry-level-workers-gen-z-embrace-technology/
2•01-_-•22m ago•1 comments

Google confirms more ads on your paid YouTube Premium Lite soon

https://www.neowin.net/news/google-confirms-more-ads-on-your-paid-youtube-premium-lite-soon/
2•01-_-•24m ago•0 comments

Germany: Digital Minister wants open source etc. as guiding principle

https://www.heise.de/en/news/Digital-Minister-wants-open-standards-and-open-source-as-guiding-principle-10414632.html
2•donutloop•25m ago•0 comments

Musk says SpaceX will retire Dragon spacecraft amid bitter Trump dispute

https://www.theguardian.com/us-news/2025/jun/05/elon-musk-spacex-dragon-trump
1•rene_d•25m ago•0 comments

AI agents are turning Salesforce and SAP into rivals

https://www.economist.com/business/2025/06/05/ai-agents-are-turning-salesforce-and-sap-into-rivals
1•petethomas•26m ago•0 comments

Ask HN: Running AI agents in isolated environments

1•polycaster•28m ago•0 comments

Sir Demis Hassabis on the Future of Knowledge – Institute for Advanced Study [video]

https://www.youtube.com/watch?v=TgS0nFeYul8
1•goplayoutside•32m ago•0 comments

Launching a simple AI Image generator app as a 17 y/o

https://www.imagation.com
1•donvchu•34m ago•1 comments

Who wrote the Bible? A pioneering new algorithm may shatter scholarly certitude

https://www.timesofisrael.com/who-wrote-the-bible-a-pioneering-new-algorithm-may-shatter-scholarly-certitude/
3•names_are_hard•34m ago•1 comments

Copilot Chat now supports attaching references using the symbol

https://github.blog/changelog/2025-06-03-copilot-chat-now-supports-attaching-references-using-the-symbol/
1•e2e4•35m ago•0 comments

Volumetric deformable terrain using three.js/webgl

https://twitter.com/sea3dformat/status/1930493486639235581
1•ToJans•37m ago•0 comments

Twenty Years of TiddlyWiki (2024)

https://tiddlywiki.com/#History%20of%20TiddlyWiki:HelloThere%20%5B%5BQuick%20Start%5D%5D%20%5B%5BFind%20Out%20More%5D%5D%20%5B%5BHistory%20of%20TiddlyWiki%5D%5D%20%5B%5BTiddlyWiki%20on%20the%20Web%5D%5D%20%5B%5BTestimonials%20and%20Reviews%5D%5D%20GettingStarted%20Community
9•Tomte•38m ago•1 comments

Floss/Fund Backs the Future of Internet Security

https://openssl-foundation.org/post/2025-06-04-floss-fund/
1•vishnumohandas•41m ago•0 comments

Using 'Slop Forensics' to Determine Model Ancestry

https://www.dbreunig.com/2025/05/30/using-slop-forensics-to-determine-model-ancestry.html
2•iamflimflam1•43m ago•0 comments

Homeless but self taught full stack developer

3•crlapples•49m ago•6 comments

Crypto's New Bailout Fund: Your Savings Account

https://www.levernews.com/cryptos-new-bailout-fund-your-savings-account/
3•miles•51m ago•0 comments

Switch 2 rooted on day 1

https://bsky.app/profile/retr0.id/post/3lqtwrndzf22w
16•mdtrooper•56m ago•5 comments

Token Visualizer to analyze and optimize your LLM prompts for cost andefficiency

https://github.com/Mattbusel/Token-Visualizer
2•Shmungus•1h ago•1 comments

Destiny – iOS app that works with Magic Wormhole and Wormhole William

https://apps.apple.com/us/app/destiny-secure-file-transfer/id6444721954
3•rahimnathwani•1h ago•3 comments

Founding PM / Co-Founder for FilFlo (AI-Native Fulfilment SaaS)

https://filflo.in/
1•profvyas•1h ago•1 comments

Microsoft backed AI startup pretending to be AI filed for bankruptcy

https://www.windowscentral.com/microsoft/builder-ai-collapse-microsoft-backed-fake-ai-services
1•jayaprabhakar•1h ago•1 comments

Vibe Coding: Where it works and where it doesn't

https://sachin.devicion.com/blog/vibe-coding-where-it-works-and-where-it-does-not
1•sachin_rcz•1h ago•0 comments