frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: Companies that advertise being a "best place to work", is it a red flag?

3•jrs235•18m ago•2 comments

Ask HN: Why is YouTube's recommendation system so bad?

4•mr-pink•22m ago•3 comments

Ask HN: Do global AGENTS.md with coding principles make sense?

2•endorphine•1h ago•1 comments

Ask HN: What happens after the AI bubble bursts?

15•101008•6h ago•13 comments

Ask HN: Why is my Claude experience so bad? What am I doing wrong?

67•moomoo11•3d ago•108 comments

Top non-ad google result for "polymarket" in Australia is a crypto scam

14•rtrgrd•16h ago•2 comments

Tadpole the Language for Scraping 0.2.0 – Complex Control Flow, Stealth and More

4•zachperkitny•2h ago•0 comments

Tell HN: Microsoft Edge self-destroys updating it in Debian based distros

3•usr1106•6h ago•1 comments

Ask HN: Ranking sliders on a personal blog?

12•incognito124•23h ago•1 comments

Ask HN: Are there examples of 3D printing data onto physical surfaces?

16•catapart•2d ago•33 comments

Ask HN: Are you using an agent orchestrator to write code?

39•gusmally•4d ago•60 comments

What web businesses will continue to make money post AI?

15•surume•1d ago•27 comments

Ask HN: Share your vibe coded project

4•firefoxd•1d ago•7 comments

Ask HN: Info on the 1982 Apple 2 text game Abuse?

6•jmount•2d ago•2 comments

Ask HN: Did YouTube change how it handles uBlock?

21•tefloon69•4d ago•13 comments

Twitter(X) Is Down

40•bakigul•6h ago•28 comments

Ask HN: LLMs helping you read papers and books

7•amelius•1d ago•2 comments

Ask HN: What's the best realtime, local, TTS solution? Live call interpretation

6•Wright007•1d ago•1 comments

Ask HN: How do you audit LLM code in programming languages you don't know?

12•syx•4d ago•13 comments

Ask HN: Want to move to use a "dumb" phone. How to make the switch?

11•absoluteunit1•1d ago•12 comments

Ask HN: Stripe is asking for bank statements to check financial health

9•kinj28•2d ago•7 comments

Ask HN: What explains the recent surge in LLM coding capabilities?

12•orange_puff•1d ago•7 comments

Ask HN: Why are electronics still so unrecyclable?

74•alexandrehtrb•5d ago•139 comments

Ask HN: We're building a saving app for European savers and need GTM advice

6•AlePra00•3d ago•16 comments

Ask HN: Better hardware means OpenAI, Anthropic, etc. are doomed in the future?

5•kart23•3d ago•10 comments

Ask HN: Exceptionally well-written research papers in CS/ML/AI?

5•b3rkus•2d ago•1 comments

Ask HN: Do sociotechnical pressures select for beneficial or harmful AI systems?

6•jerlendds•3d ago•3 comments

Ask HN: Is OpenClaw a groundbreaking feat, a highly useful product or both?

3•chirau•21h ago•7 comments

ClawdReview – OpenReview for AI Agents

5•mingtianzhang•2d ago•0 comments

Ask HN: What happens when capability decouples from credentials?

11•falsework•3d ago•7 comments
Open in hackernews

Tadpole the Language for Scraping 0.2.0 – Complex Control Flow, Stealth and More

4•zachperkitny•2h ago
Hello,

I posted a few weeks ago about my custom scraping language. It definitely got some traction, which was very exciting to see.

Github Repo: https://github.com/tadpolehq/tadpole Docs: https://tadpolehq.com/

The past 2 weeks, I've been focusing my efforts in introducing specific stealth actions, more complicated control flow actions and a lot of various evaluators for cleaning data.

Here is an example for scraping from `books.toscrape.com`

  main {
    new_page {
      goto "https://books.toscrape.com/"
      loop {
        do {
          $$ article.product_pod {
            extract "books[]" {
              title { $ "h3 a"; attr title }
              rating {
                $ ".star-rating";
                attr "class";
                extract "star-rating (One|Two|Three|Four|Five)" caseInsensitive=#true;
                func "(v) => ({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}[v.toLowerCase()] || null)"
              }
              price { $ "p.price_color"; text; as_float }
              in_stock { $ "p.availability"; text; matches "In stock" caseInsensitive=#true }
            }
          }
        }
        while { $ "li.next" }
        next {
          $ "li.next a" { click }
          wait_until
        }
      }
    }
  }
I've introduced actions like `apply_identity` to override User Agent Headers and User Agent Metadata. Here is an example module to selectively create different identities:

  module stealth {
    // Apple M2 Pro
    action apply_apple_m2 {
      apply_identity mac
      set_webgl_vendor "Apple Inc." "Apple M2"
      set_device_memory 16
      set_hardware_concurrency 8
      set_viewport 1440 900 deviceScaleFactor=2
    }

    // Windows Desktop
    action apply_windows_16_8 {
      apply_identity windows
      set_webgl_vendor "Google Inc. (Intel)" "ANGLE (Intel, Intel(R) UHD Graphics 620 Direct3D11 vs_5_0 ps_5_0)"
      set_device_memory 16
      set_hardware_concurrency 8
      set_viewport 1920 1080
    }

    // Windows Budget Laptop
    action apply_windows_8_4 {
      apply_identity windows
      set_webgl_vendor "Google Inc. (Intel)" "ANGLE (Intel, Intel(R) UHD Graphics 620 Direct3D11 vs_5_0 ps_5_0)"
      set_device_memory 8
      set_hardware_concurrency 4
      set_viewport 1366 768
    }
  }

The full release changelog is available here: https://github.com/tadpolehq/tadpole/releases/

My goals for the next 0.3.0 release is to heavily focus on Plugins, Distributed Execution through Message Queues, Redis Support for Crawling, Static Parsing as opposed to exclusively over CDP/Chrome.

I will keep trying to keep my release cadence at every 2 weeks!