For k8s, you can run multiple Reader instances behind a load balancer, each manages its own browser pool. Main things to watch:
- Memory limits (~500MB-1GB per concurrent browser)
- Headless Chrome needs --no-sandbox or a seccomp profile
- Sticky sessions for crawl jobs (or run full crawl on single pod)
A dedicated k8s guide is on the roadmap...
verdverm•57m ago
The main challenge is distributed rate-limiting, something I'd hope the framework handles for me. Also having k8s settings that work well in your experience w.r.t. scaling
nihalwashere•37m ago
Distributed rate-limiting is intentionally not in the core library, Reader focuses on the scraping primitives and stays unopinionated about orchestration.
For multi-node rate limiting, you'd layer that on top: Redis + a simple limiter that gates calls to reader.scrape().
verdverm•1h ago
Any docs on how to run this on multiple machines? (ideally k8s)
nihalwashere•1h ago
There's a Docker deployment guide here: https://docs.reader.dev/documentation/guides/deployment
For k8s, you can run multiple Reader instances behind a load balancer, each manages its own browser pool. Main things to watch:
- Memory limits (~500MB-1GB per concurrent browser) - Headless Chrome needs --no-sandbox or a seccomp profile - Sticky sessions for crawl jobs (or run full crawl on single pod)
A dedicated k8s guide is on the roadmap...
verdverm•57m ago
nihalwashere•37m ago
For multi-node rate limiting, you'd layer that on top: Redis + a simple limiter that gates calls to reader.scrape().
For k8s resource settings, the Docker guide is a good starting point: https://docs.reader.dev/documentation/guides/deployment
But I will add some reference examples on how to build a rate-limiting and K8s orchestration layer on top of Reader...
Thanks for sharing this :)