frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Tadpole the Language for Scraping 0.2.0 – Complex Control Flow, Stealth and More

4•zachperkitny•1h ago
Hello,

I posted a few weeks ago about my custom scraping language. It definitely got some traction, which was very exciting to see.

Github Repo: https://github.com/tadpolehq/tadpole Docs: https://tadpolehq.com/

The past 2 weeks, I've been focusing my efforts in introducing specific stealth actions, more complicated control flow actions and a lot of various evaluators for cleaning data.

Here is an example for scraping from `books.toscrape.com`

  main {
    new_page {
      goto "https://books.toscrape.com/"
      loop {
        do {
          $$ article.product_pod {
            extract "books[]" {
              title { $ "h3 a"; attr title }
              rating {
                $ ".star-rating";
                attr "class";
                extract "star-rating (One|Two|Three|Four|Five)" caseInsensitive=#true;
                func "(v) => ({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}[v.toLowerCase()] || null)"
              }
              price { $ "p.price_color"; text; as_float }
              in_stock { $ "p.availability"; text; matches "In stock" caseInsensitive=#true }
            }
          }
        }
        while { $ "li.next" }
        next {
          $ "li.next a" { click }
          wait_until
        }
      }
    }
  }
I've introduced actions like `apply_identity` to override User Agent Headers and User Agent Metadata. Here is an example module to selectively create different identities:

  module stealth {
    // Apple M2 Pro
    action apply_apple_m2 {
      apply_identity mac
      set_webgl_vendor "Apple Inc." "Apple M2"
      set_device_memory 16
      set_hardware_concurrency 8
      set_viewport 1440 900 deviceScaleFactor=2
    }

    // Windows Desktop
    action apply_windows_16_8 {
      apply_identity windows
      set_webgl_vendor "Google Inc. (Intel)" "ANGLE (Intel, Intel(R) UHD Graphics 620 Direct3D11 vs_5_0 ps_5_0)"
      set_device_memory 16
      set_hardware_concurrency 8
      set_viewport 1920 1080
    }

    // Windows Budget Laptop
    action apply_windows_8_4 {
      apply_identity windows
      set_webgl_vendor "Google Inc. (Intel)" "ANGLE (Intel, Intel(R) UHD Graphics 620 Direct3D11 vs_5_0 ps_5_0)"
      set_device_memory 8
      set_hardware_concurrency 4
      set_viewport 1366 768
    }
  }

The full release changelog is available here: https://github.com/tadpolehq/tadpole/releases/

My goals for the next 0.3.0 release is to heavily focus on Plugins, Distributed Execution through Message Queues, Redis Support for Crawling, Static Parsing as opposed to exclusively over CDP/Chrome.

I will keep trying to keep my release cadence at every 2 weeks!

First public patch for Unreal Tournament 2004 in over 20 years

https://github.com/OldUnreal/UT2004Patches/releases
1•NKosmatos•1m ago•0 comments

OpenAI Mission Statement through the years

https://www.closedopenai.com/
1•eternalyxiii•4m ago•1 comments

Vanilla Light – Full Stack Web Framework

https://github.com/beachdevs/vanilla-light
1•dpweb•4m ago•0 comments

PostgreSQL Bloat Is a Feature, Not a Bug

https://rogerwelin.github.io/2026/02/11/postgresql-bloat-is-a-feature-not-a-bug/
1•birdculture•4m ago•0 comments

Dozens of Australians diagnosed with rare tattoo-related vision loss

https://www.abc.net.au/news/health/2026-02-14/tattoo-eye-inflammation/106315444
2•bookofjoe•6m ago•1 comments

KPMG partner fined over using AI to pass AI test

https://www.ft.com/content/c30ded60-bece-45e0-981d-653e1e3e9818
1•mmarian•6m ago•1 comments

Show HN: Personal AI Talent Agency for Content Creators

1•aa_y_ush•11m ago•0 comments

Conversations with AI: What I Learned About Myself

https://luisfernandoyt.makestudio.app/blog/878-conversations-with-ai
1•lout332•13m ago•0 comments

Debugging Kernel Oops

https://lfhernandez.com/posts/debugging-kernel-oops/
1•linolevan•14m ago•0 comments

Vercel-labs/portless: Replace port numbers with stable, named .localhost URLs

https://github.com/vercel-labs/portless
1•bdcravens•14m ago•0 comments

How (and why) we migrated to Tanstack from Next.js

https://www.inngest.com/blog/migrating-off-nextjs-tanstack-start
2•absarokafish•14m ago•0 comments

The singularity won't be gentle

https://www.natesilver.net/p/the-singularity-wont-be-gentle
2•softwaredoug•14m ago•0 comments

State of Show HN: 2025

https://blog.sturdystatistics.com/posts/show_hn/
1•kianN•16m ago•0 comments

Shifting structures in a software world dominated by AI

https://twitter.com/Thom_Wolf/status/2023387043967959138
1•bilsbie•16m ago•0 comments

Show HN: Skillaudit.sh – A minimalist security auditor for LLM skill definitions

https://skillaudit.sh/checks
1•dns•16m ago•0 comments

Pentagon reviewing Anthropic partnership over terms of use dispute

https://thehill.com/policy/defense/5740369-pentagon-anthropic-relationship-review/
1•c420•17m ago•0 comments

Fff.nvim – the first ever typo resistant code search

https://github.com/dmtrKovalenko/fff.nvim
1•neogoose•19m ago•1 comments

Dutch cops arrest man after sending him confidential files

https://www.theregister.com/2026/02/16/dutch_cops_breach/
2•OptionOfT•21m ago•0 comments

Bridging the gap between fitness apps and personal training with AI

https://liftoffmvp.io/
1•bobawarrior99•21m ago•1 comments

Amazon EC2 supports nested virtualization on virtual Amazon EC2 instances

https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-ec2-nested-virtualization-on-virtual/
1•sikiladho•24m ago•0 comments

Ask HN: What are the biggest limitations of agentic AI in real-world workflows?

1•aadarshkumaredu•26m ago•1 comments

Show HN: SkillForge – Turn screen recordings into AI agent skills (SKILL.md)

https://skillforge.expert
1•YaraDori•26m ago•0 comments

AI Is Killing Art

https://natansessays.com/posts/the-future-of-ai-art/
1•JhonOliver•26m ago•1 comments

Dewdrops – Turn your Git repo into a single Markdown file for LLMs

https://github.com/MedUnes/dewdrops
1•medunes•27m ago•4 comments

Teaching Codex to Resolve Incidents

https://outcrop.app/blog/teaching-codex
1•imedadel•29m ago•0 comments

Unitree robot's martial arts performance at the Chinese New Year gala

https://twitter.com/zhao_dashuai/status/2023400800366858247
1•latchkey•30m ago•0 comments

AI Agent Lands PRs in Major OSS Projects

https://socket.dev/blog/ai-agent-lands-prs-in-major-oss-projects-targets-maintainers-via-cold-out...
1•bradyholt•32m ago•0 comments

Show HN: NadirClaw – Open-source LLM router with 10ms classification

https://github.com/doramirdor/NadirClaw
1•amirdor•37m ago•0 comments

Show HN: RunbookAI – Stop scrolling dashboards at 3 a.m., let AI investigate

https://github.com/Runbook-Agent/RunbookAI
1•EmTekker•37m ago•0 comments

Dish Pushes Volumetric 3D Printing to 0.6 Seconds

https://www.fabbaloo.com/news/dish-pushes-volumetric-3d-printing-to-0-6-seconds
1•thomasjb•37m ago•0 comments