frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Oddly Simple GUI Programs

https://simonsafar.com/2024/win32_lights/
1•MaximilianEmel•38s ago•0 comments

The New Playbook for Leaders [pdf]

https://www.ibli.com/IBLI%20OnePagers%20The%20Plays%20Summarized.pdf
1•mooreds•1m ago•0 comments

Interactive Unboxing of J Dilla's Donuts

https://donuts20.vercel.app
1•sngahane•2m ago•0 comments

OneCourt helps blind and low-vision fans to track Super Bowl live

https://www.dezeen.com/2026/02/06/onecourt-tactile-device-super-bowl-blind-low-vision-fans/
1•gaws•4m ago•0 comments

Rudolf Vrba

https://en.wikipedia.org/wiki/Rudolf_Vrba
1•mooreds•4m ago•0 comments

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

https://www.medpagetoday.com/neurology/autism/119747
1•paulpauper•5m ago•0 comments

Wellness Hotels Discovery Application

https://aurio.place/
1•cherrylinedev•6m ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...
1•mooreds•6m ago•0 comments

Sebastian Galiani on the Marginal Revolution

https://marginalrevolution.com/marginalrevolution/2026/02/sebastian-galiani-on-the-marginal-revol...
1•paulpauper•10m ago•0 comments

Ask HN: Are we at the point where software can improve itself?

1•ManuelKiessling•10m ago•0 comments

Binance Gives Trump Family's Crypto Firm a Leg Up

https://www.nytimes.com/2026/02/07/business/binance-trump-crypto.html
1•paulpauper•10m ago•0 comments

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

https://old.reddit.com/r/ClaudeCode/comments/1qy5l0n/reverse_engineering_chinese_shitprogram_for/
1•edward•10m ago•0 comments

Indian Culture

https://indianculture.gov.in/
1•saikatsg•13m ago•0 comments

Show HN: Maravel-Framework 10.61 prevents circular dependency

https://marius-ciclistu.medium.com/maravel-framework-10-61-0-prevents-circular-dependency-cdb5d25...
1•marius-ciclistu•13m ago•0 comments

The age of a treacherous, falling dollar

https://www.economist.com/leaders/2026/02/05/the-age-of-a-treacherous-falling-dollar
2•stopbulying•13m ago•0 comments

Ask HN: AI Generated Diagrams

1•voidhorse•16m ago•0 comments

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
3•josephcsible•16m ago•0 comments

Show HN: A delightful Mac app to vibe code beautiful iOS apps

https://milq.ai/hacker-news
5•jdjuwadi•19m ago•1 comments

Show HN: Gemini Station – A local Chrome extension to organize AI chats

https://github.com/rajeshkumarblr/gemini_station
1•rajeshkumar_dev•19m ago•0 comments

Welfare states build financial markets through social policy design

https://theloop.ecpr.eu/its-not-finance-its-your-pensions/
2•kome•23m ago•0 comments

Market orientation and national homicide rates

https://onlinelibrary.wiley.com/doi/10.1111/1745-9125.70023
4•PaulHoule•23m ago•0 comments

California urges people avoid wild mushrooms after 4 deaths, 3 liver transplants

https://www.cbsnews.com/news/california-death-cap-mushrooms-poisonings-liver-transplants/
1•rolph•24m ago•0 comments

Matthew Shulman, co-creator of Intellisense, died 2019 March 22

https://www.capenews.net/falmouth/obituaries/matthew-a-shulman/article_33af6330-4f52-5f69-a9ff-58...
3•canucker2016•25m ago•1 comments

Show HN: SuperLocalMemory – AI memory that stays on your machine, forever free

https://github.com/varun369/SuperLocalMemoryV2
1•varunpratap369•26m ago•0 comments

Show HN: Pyrig – One command to set up a production-ready Python project

https://github.com/Winipedia/pyrig
1•Winipedia•28m ago•0 comments

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network [pdf]

https://github.com/AysajanE/moltbook-persistence/blob/main/paper/main.pdf
1•EagleEdge•28m ago•0 comments

C and C++ dependencies: don't dream it, be it

https://nibblestew.blogspot.com/2026/02/c-and-c-dependencies-dont-dream-it-be-it.html
1•ingve•29m ago•0 comments

Show HN: Vbuckets – Infinite virtual S3 buckets

https://github.com/danthegoodman1/vbuckets
1•dangoodmanUT•29m ago•0 comments

Open Molten Claw: Post-Eval as a Service

https://idiallo.com/blog/open-molten-claw
1•watchful_moose•30m ago•0 comments

New York Budget Bill Mandates File Scans for 3D Printers

https://reclaimthenet.org/new-york-3d-printer-law-mandates-firearm-file-blocking
2•bilsbie•31m ago•1 comments
Open in hackernews

Show HN: Quack-Cluster – A serverless distributed SQL engine with DuckDB and Ray

https://github.com/kristianaryanto/Quack-Cluster
3•kristian1232•6mo ago
Hi HN,

I'm excited to share a project I've been working on: Quack-Cluster.

I love the speed and simplicity of DuckDB for analytics, but I often work with datasets spread across hundreds of files in object storage (like S3). I wanted a way to run distributed queries across all that data without the complexity of setting up and managing a full-blown Spark or Presto cluster. I'm also a big fan of Ray for its simplicity in distributed Python, so I decided to combine them.

How it works: You send a standard SQL query to a central coordinator. It uses SQLGlot to parse the query and identify the target files (e.g., s3://bucket/data/*.parquet). It then generates a distributed plan and sends tasks to a cluster of Ray actors. Each Ray actor runs an embedded DuckDB instance to process a subset of the files in parallel. The partial results (as Arrow tables) are then aggregated and returned to the user.

The goal is to provide a lightweight, high-performance, and serverless alternative for interactive SQL analytics directly on a data lake.

The core tech stack is:

Backend: Python, FastAPI

Distributed Computing: Ray

Query Engine: DuckDB

SQL Parsing: SQLGlot

The project is open-source and I've tried to make it easy to get started locally with Docker and make. I'm here to answer any questions and would be grateful for any feedback on the architecture, use case, or the code itself.

Thanks for checking it out!

Comments

kristian1232•6mo ago
First Comment
hodgesrm•6mo ago
Sounds interesting! What kind of query latency do you see with this approach?

Also, have you thought about caching? My team is working on a similar problem and we have caches for everything from contents of S3 list_objects_v2 calls to Parquet metadata to blocks read from object storage.

kristian1232•6mo ago
Thanks for the great questions!

Query Latency Query latency is highly variable and depends on several factors:

Query Type: A simple SELECT with a WHERE clause on a single table will be much faster than a complex multi-table JOIN that requires shuffling data between workers.

Data Size: The total volume of data being scanned from disk or object storage is a primary driver of latency.

Execution Plan: The system chooses between different plans. A

- LocalExecutionPlan that runs on a single node is fastest. A

- DistributedBroadcastJoinPlan is used when one table is small and is generally faster than a DistributedShuffleJoinPlan, which is the fallback for large tables and tends to have the highest latency.

Fault Tolerance: If a worker node fails, the system will automatically retry the task up to a configured maximum, which can add to the total execution time.

Caching Yes, caching is a key feature! Your team's approach sounds very thorough. Our current implementation focuses on caching the final results of queries to avoid re-computation.

Here’s how it works:

In-Memory TTL Cache: We use a simple, time-to-live (TTL) in-memory cache for the /query endpoint. When a query is executed, a SHA256 hash of the SQL string and the requested format (e.g., "json" or "arrow") is used as the cache key.

Cache Check: For every incoming query, we first check the cache. If a valid, non-expired result is found, we return it immediately, which is significantly faster.

Cache Population: If it's a cache miss, the query is fully executed, and the final result is stored in the cache before being sent to the client. The TTL is configurable, defaulting to 300 seconds.

This approach caches the final output rather than lower-level data like file metadata or individual data blocks, but your point about caching Parquet metadata and S3 listings is excellent—that would be a great way to further optimize the planning phase.