frontpage.

Show HN: AnyCrawl v0.0.1-alpha.5 – custom user-agent and richer scraping API

https://github.com/any4ai/AnyCrawl

2•ntbperst•13h ago

## [0.0.1-alpha.5] - 2025-06-14

### Added

- Integrated AWS S3 storage support with new `S3` class and environment variables for seamless file uploads and retrievals. - Introduced `FileController` for serving files from S3 or local storage with robust path validation and error handling. - Added multiple content transformers (Screenshot, `HTMLTransformer`) improving HTML/Markdown extraction and screenshot generation. - Extended scraping capabilities with new options: output `formats`, `timeout`, tag filtering, `wait_for`, retry strategy, viewport configuration, and custom user-agent support. - Added Safe Search parameter to `SearchSchema` for filtered search results. - Refactored engine architecture with a factory pattern and new core modules for configuration validation, data extraction, and job management. - Implemented graceful shutdown handling for the API server and improved logging for uncaught exceptions / unhandled rejections. - Added Jest configuration for API and library packages with ESM support and updated test scripts. - Updated CI workflows to publish Docker images on version tags. - Expanded README with detailed environment variable descriptions and API usage examples.

### Changed

- Refined error handling in `ScrapeController` and `JobManager`; failure responses now include structured error objects and HTTP status codes. - Enhanced `BaseEngine` with explicit HTTP error checks and resilience improvements. - Updated OpenAPI documentation to reflect new scraping parameters and error formats. - Migrated key-value store name to environment configuration for greater flexibility. - Enhanced per-request credit tracking in `ScrapeController` and enhanced logging middleware to include credit usage.

### Fixed

- Improved job failure messages to include detailed error data, ensuring clearer debugging information. - Minor documentation corrections and clarifications.

Get your compliance automated now

Show HN: Made a 3 SEC log streaming setup (paste command –> streaming starts)

Disaster Party – A "Universal" AI API SDK

The Art of Lisp and Writing

A Parting Message to My Students

Dead Hand automatic nuclear weapons control system

Trade with China Is Becoming a One-Way Street

Show HN: Mdc – just another Markdown viewer with ToC and CLI support

Government awards contract to French company to develop sonar system

The Apple "Reasoning Collapse" Paper Is Even Dumber Than You Think

Spatializing 6k years of global urbanization from 3700 BC to AD 2000

Coinbase, famously a "no politics" company in 2020, sponsors a military parade

Introduction to Competitive Programming in Haskell

Sweden gets help pulling its sovereign AI socks up

How you breathe is like a fingerprint that can identify you

Root Cause of the June 12, 2025 Google Cloud Outage

Disturbing Rumor – PBS NewsHour (Brooks / Capehart)

I build an anonymous stranger chat with no log in

Novo Nordisk's Canadian Mistake

Show HN: Shields.rs – a Rust badge engine 10x faster than Node.js

Software Engineering Talent Is Gold Right Now

Centralization or Decentralization? Evolution of State-Ownership in China (2022)

The Algebra of an Infinite Grid of Resistors

Ordinary users can also generate professional and creative print ads

Arkane Linux: Opinionated, immutable, atomic Arch-based distribution

Remove Bug Bounty Program

Adding .md URLs for Raw Markdown Content in Next.js

Scaling Laws – Can Someone Tell Elon?

Smooth Page Transitions in Next.js with next-view-transitions

The Trolley Problem: the UX of shopping carts (2023)