frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: A luma dependent chroma compression algorithm (image compression)

https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-...
25•mbitsnbites•3d ago•2 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
47•momciloo•6h ago•9 comments

Show HN: Browser based state machine simulator and visualizer

https://svylabs.github.io/smac-viz/
8•sridhar87•4d ago•3 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
2•cui•1h ago•1 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
44•sandGorgon•2d ago•20 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
294•isitcontent•1d ago•39 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
362•eljojo•1d ago•217 comments

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

https://github.com/writerslogic/witnessd
2•davidcondrey•1h ago•1 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
374•vecti•1d ago•171 comments

Show HN: PalettePoint – AI color palette generator from text or images

https://palettepoint.com
2•latentio•3h ago•0 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
97•antves•2d ago•70 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
85•phreda4•1d ago•17 comments

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent
2•shubham-coder•5h ago•1 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
155•bsgeraci•1d ago•65 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
29•dchu17•1d ago•12 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
55•nwparker•2d ago•12 comments

Show HN: Stacky – certain block game clone

https://www.susmel.com/stacky/
3•Keyframe•6h ago•0 comments

Show HN: A toy compiler I built in high school (runs in browser)

https://vire-lang.web.app
3•xeouz•7h ago•1 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
23•NathanFlurry•1d ago•11 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
18•denuoweb•2d ago•2 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
3•anipaleja•9h ago•0 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
173•vkazanov•2d ago•49 comments

Show HN: MCP App to play backgammon with your LLM

https://github.com/sam-mfb/backgammon-mcp
3•sam256•11h ago•1 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
9•sakanakana00•12h ago•2 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•12h ago•1 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
27•JoshPurtell•1d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•1d ago•8 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
7•rahuljaguste•1d ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
22•keepamovin•17h ago•6 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•2d ago•2 comments
Open in hackernews

Show HN: wxpath – Declarative web crawling in XPath

https://github.com/rodricios/wxpath
64•rodricios•3w ago
wxpath is a declarative web crawler where web crawling and scraping are expressed directly in XPath.

Instead of writing imperative crawl loops, you describe what to follow and what to extract in a single expression:

    import wxpath

    # Crawl, extract fields, build a Wikipedia knowledge graph
    path_expr = """
    url('https://en.wikipedia.org/wiki/Expression_language')
         ///url(//main//a/@href[starts-with(., '/wiki/') and not(contains(., ':'))])
             /map{
                'title': (//span[contains(@class, "mw-page-title-main")]/text())[1] ! string(.),
                'url': string(base-uri(.)),
                'short_description': //div[contains(@class, 'shortdescription')]/text() ! string(.),
                'forward_links': //div[@id="mw-content-text"]//a/@href ! string(.)
             }
    """

    for item in wxpath.wxpath_async_blocking_iter(path_expr, max_depth=1):
        print(item)
The key addition is a `url(...)` operator that fetches and returns HTML for further XPath processing, and `///url(...)` for deep (or paginated) traversal. Everything else is standard XPath 3.1 (maps/arrays/functions).

Features:

- Async/concurrent crawling with streaming results

- Scrapy-inspired auto-throttle and polite crawling

- Hook system for custom processing

- CLI for quick experiments

Another example, paginating through HN comments (via "follow=" argument) pages and extracting data:

    url('https://news.ycombinator.com',
        follow=//a[text()='comments']/@href | //a[@class='morelink']/@href)
        //tr[@class='athing']
          /map {
            'text': .//div[@class='comment']//text(),
            'user': .//a[@class='hnuser']/@href,
            'parent_post': .//span[@class='onstory']/a/@href
          }
Limitations: HTTP-only (no JS rendering yet), no crawl persistence. Both are on the roadmap if there's interest.

GitHub: https://github.com/rodricios/wxpath

PyPI: pip install wxpath

I'd love feedback on the expression syntax and any use cases this might unlock.

Thanks!

Comments

css_apologist•2w ago
xpath is so fucking cool

i can understand why it failed for general use, but shit like this revives my excitement

q: i'm not an expert, this looks like it extends xpath syntax? haven't seen stuff like the /map is this referring to the html map element? or a fp-style map?

rodricios•2w ago
I think xpath is cool too!

If wxpath can help revive some of that excitement, then I consider my project a success.

As for your question, while wxpath does extend the xpath syntax, `/map` is not one of its additions, nor is it a html map element.

XPath 3.1 introduced first-class maps (and arrays) (https://www.w3.org/TR/xpath-31/#id-maps), and `/map` is the syntax to create said structure. It's an awesome feature that's especially useful for quickly delivering JSON-like objects.

css_apologist•2w ago
sick, ty
rhdunn•2w ago
Maps were added in XPath 3.1 -- https://www.w3.org/TR/xpath-31/#id-maps.

There's currently work on XPath 4.0 -- https://qt4cg.org/specifications/xquery-40/xpath-40.html.

jerf•2w ago
XPath may have "failed" for general use but it's generally well-enough supported that I can find a library in the common languages I've used when I went looking for it. In some ways the hard part is just knowing it exists so you can use it if you need it.
rodricios•2w ago
Couldn't agree more.

I should also add that most (Python-based) web crawling and scraping frameworks support XPath engines OOTB: Scrapy, Crawlee, etc. In that sense, XPath is very much alive.

rodricios•2w ago
Hey, wxpath author here. It's pretty cool seeing this project reach the front page a week after posting it.

Just wanted to mention a few things.

wxpath is a result of a decade of working and thinking about web crawling and scraping. I created two somewhat popular Python web-extraction projects a decade ago (eatiht, and libextract), and even helped publish a metaanalysis on scrapers, all heavily relying on lxml/XPath.

After finding some time on my hands and after a hiatus on actually writing web scrapers, I decided to return to this little problem domain.

Obviously, LLMs have proven to be quite formidable at web content extraction, but they encounter the now-familiar issues of token limits and cost.

Besides LLMs, there's been some great projects making great progress on the problem of web data extraction, like the Scrapy and Crawlee frameworks, and projects like Ferret (https://www.montferret.dev/docs/introduction/) - another declarative web crawling framework - and others (Xidel, https://github.com/benibela/xidel).

The shared, common abstraction of most web-scraping frameworks and tools is "node selectors" - the syntax and engine for extracting nodes and their data.

XPath has proven resilient and continues to be a popular node-selection and processing language. However, what it lacks, which other frameworks provide, is crawling.

wxpath is an attempt to fill that gap.

Hope people find it useful!

https://github.com/rodricios/eatiht https://github.com/datalib/libextract

neilv•2w ago
It's impressive that wxpath does the DSL as an extension of XPath syntax. I hadn't quite thought of it that way.

I routinely used a mix of XPath and arbitrary code heavily for Web scraping (as implied in the intro for "https://docs.racket-lang.org/html-parsing/").

Then I made some DSLs for doing some of the common scraping coding patterns more concisely and declaratively, but the DSLs ended up in a Lisp-y syntax, not looking like XPath.

rodricios•2w ago
Making wxpath as an extension to the XPath DSL was a key goal of mine.

The hard part was ensuring the syntax looked and felt as XPath-y as possible.

Open to any feedback wrt to the syntax and semantics!