frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers

https://github.com/apify/crawlee-python
2•jancurn•1h ago

Comments

jancurn•1h ago
Hey HN,

This is Jan, the founder of Apify (https://apify.com/) — a full-stack web scraping platform.

With the help of Python community and the early adopters feedback, after an year of building Crawlee for Python in beta mode, we are launching Crawlee for Python v1.0.0.

The main features are:

- Unified storage client system: less duplication, better extensibility, and a cleaner developer experience. It also opens the door for the community to build and share their own storage client implementations.

- Adaptive Playwright crawler: makes your crawls faster and cheaper, while still allowing you to reliably handle complex, dynamic websites. In practice, you get the best of both worlds: speed on simple pages and robustness on modern, JavaScript-heavy sites.

- New default HTTP client `ImpitHttpClient` (https://crawlee.dev/python/api/class/ImpitHttpClient), powered by the Impit (https://github.com/apify/impit) library): fewer false positives, more resilient crawls, and less need for complicated workarounds. Impit is also developed as an open-source project by Apify, so you can dive into the internals or contribute improvements yourself: you can also create your own instance, configure it to your needs (e.g., enable HTTP/3 or choose a specific browser profile), and pass it into your crawler.

- Sitemap request loader: easier to start large-scale crawls where sitemaps already provide full coverage of the site

- Robots exclusion standard: not only helps you build ethical crawlers, but can also save time and bandwidth by skipping disallowed or irrelevant pages

- Fingerprinting: each crawler run looks like a real browser on a real device. Using fingerprinting in Crawlee is straightforward: create a fingerprint generator with your desired options and pass it to the crawler.

- Open telemetry: monitor real-time dashboards or analyze traces to understand crawler performance. easier to integrate Crawlee into existing monitoring pipelines

For details, you can read the announcement blog post: https://crawlee.dev/blog/crawlee-for-python-v1

Our team and I will be happy to answer here any questions you might have.

As Floods Worsen, Pakistan Is the Epicenter of Climate Change

https://e360.yale.edu/features/pakistan-climate-floods
1•Brajeshwar•1m ago•0 comments

Llms.py – Local ChatGPT-Like UI and OpenAI Chat Server

https://servicestack.net/posts/llms-py-ui
1•mythz•5m ago•0 comments

AI CheatSheet – AI Tools Directory

https://www.aicheatsheet.org
1•Yugoleliatrope•6m ago•0 comments

Ask HN: Pure HTML micro-front end

1•sudo_bangbang•7m ago•0 comments

Imagine with Claude: build working software and UI on the fly [video]

https://www.youtube.com/watch?v=dGiqrsv530Y
1•mustaphah•8m ago•0 comments

Sock it to the shoes: why more offices are going footwear-free

https://www.theguardian.com/money/2025/sep/30/sock-it-to-the-shoes-why-more-offices-are-going-foo...
1•n1b0m•11m ago•0 comments

Higgs Audio

https://higgs-audio.com/
1•yuyu74189w•11m ago•0 comments

F-Droid says Google's new sideloading restrictions will kill the project

https://arstechnica.com/gadgets/2025/09/f-droid-calls-for-regulators-to-stop-googles-crackdown-on...
1•robtherobber•13m ago•0 comments

Show HN: My heart is open source

https://www.myheartisopensource.com
3•transitivebs•13m ago•0 comments

NoPorn – Stop Pornhub

https://chromewebstore.google.com/detail/noporn/jielhlhakhalkkefcgnhcopfhglehdna
6•jacktheprogram•13m ago•2 comments

UAlbany Chemists Create New High-Energy Compound to Fuel Space Flight

https://www.albany.edu/news-center/news/2025-ualbany-chemists-create-new-high-energy-compound-fue...
1•ceolin•14m ago•0 comments

Show HN: jsonpipe - stream JSON tweaks toolkit in GO

https://github.com/Veinar/jsonpipe
1•veinar_gh•15m ago•0 comments

Global Sumud Flotilla Tracker

https://globalsumudflotilla.org/tracker/
3•joejohnson•19m ago•1 comments

Got hit by 1k Trump bots within an hour after launching a platform

2•vinserello•24m ago•0 comments

Periodic Labs aims to build a scientific super-intelligence

https://www.nytimes.com/2025/09/30/technology/ai-meta-google-openai-periodic.html
1•car•26m ago•0 comments

Notes on Unreal Engine 5, Nanite

https://www.4rknova.com//blog/2021/06/09/unreal-5-nanite
1•ibobev•34m ago•0 comments

Ratios of iterated logarithms to different bases

https://www.johndcook.com/blog/2025/09/29/log-log-x/
1•ibobev•34m ago•0 comments

The Majority of Your Users

https://jacobtomlinson.dev/posts/2025/the-majority-of-your-users/
3•jacobtomlinson•35m ago•0 comments

The Game Engine that would not have been made without Rust

https://blog.vermeilsoft.com/2025-09-rust-game-engine/
3•ksec•36m ago•1 comments

Using the TPDE Codegen Back End in LLVM Orc

https://weliveindetail.github.io/blog/post/2025/09/30/tpde-in-llvm-orc.html
1•weliveindetail•37m ago•0 comments

Show HN: InpaintKit – A plugin let you use latest AI models in Photoshop

https://inpaintkit.com/
1•tuyenhx•40m ago•0 comments

Affinity (owned by Canva) is closing their public forums and using Discord

https://forum.affinity.serif.com/index.php?/topic/235712-important-update-affinity-forum-transition/
3•latexr•43m ago•1 comments

Io_uring is not an event system [2021]

https://despairlabs.com/blog/posts/2021-06-16-io-uring-is-not-an-event-system/
2•signa11•46m ago•0 comments

Show HN: EasyDesign-one-click reproduction of any popular poster

https://jiandan.link/
1•ovelv•46m ago•0 comments

DeepSeek-v3.2-Exp

https://twitter.com/deepseek_ai/status/1972604768309871061
2•kristianp•49m ago•0 comments

I built a 4000 Weeks/Slow Productivity-inspired tool (yes I see the irony)

https://findspace.app/
1•bastiaant•50m ago•1 comments

Show HN: Are you in AI coded startup race? Check out Dodo Payments SDK in Rust

https://github.com/PiyushXCoder/dodo-payments-rs
1•PiyushXCoder•51m ago•0 comments

The Problem with AI Is the Problem with Capitalism

https://jacobin.com/2023/03/ai-artificial-intelligence-art-chatgpt-jobs-capitalism
2•saubeidl•51m ago•0 comments

Comprehension Debt: The Ticking Time Bomb of LLM-Generated Code

https://codemanship.wordpress.com/2025/09/30/comprehension-debt-the-ticking-time-bomb-of-llm-gene...
3•todsacerdoti•51m ago•0 comments

Microsoft has lost it's way

https://www.zdnet.com/article/microsoft-has-lost-its-way/
6•thegoodduck•55m ago•1 comments