frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: An open source access logs analytics script to block bot attacks

https://github.com/tempesta-tech/webshield
22•krizhanovsky•6h ago•2 comments

Show HN: Metorial (YC F25) – Vercel for MCP

https://github.com/metorial/metorial
43•tobihrbr•11h ago•15 comments

Show HN: Wispbit - Linter for AI coding agents

https://wispbit.com
23•dearilos•6h ago•11 comments

Show HN: CSS Extras

https://github.com/sindresorhus/css-extras
97•mofle•6d ago•60 comments

Show HN: PlayMyMood – Generate YouTube Music playlists based on your mood

https://playmymood.com/
2•speeq•4h ago•0 comments

Show HN: Relaya – Agent calls businesses for you

https://relaya.ai/
5•rishavmukherji•5h ago•0 comments

Show HN: Free API to extract PDF data

6•leftnode•9h ago•0 comments

Show HN: SQLite Online – 11 years of solo development, 11K daily users

https://sqliteonline.com/
448•sqliteonline•1d ago•138 comments

Show HN: Pathwave.io – MCP and mobile app to manually approve AI actions

https://web.pathwave.io/docs
2•felipe-pathwave•5h ago•0 comments

Show HN: Nofan Framework 16 Fan Controller

https://github.com/laktak/nofan
2•laktak•5h ago•0 comments

Show HN: AI toy I worked on is in stores

https://www.walmart.com/ip/SANTA-SMAGICAL-PHONE/16364964771
146•Sean-Der•2d ago•164 comments

Show HN: I built a simple ambient sound app with no ads or subscriptions

https://ambisounds.app/
295•alpaca121•2d ago•117 comments

Show HN:I built a free AI tool that scans and sorts financial news for traders

https://www.fxradar.live/
4•LuckyAleh•8h ago•1 comments

Show HN: Get a PMF score for your website, based on simulated user data

https://semilattice.ai/demos/pmf-report
2•jtewright•9h ago•0 comments

Show HN: I made an esoteric programming language that's read like a spellbook

https://github.com/sirbread/spellscript
171•sirbread•2d ago•55 comments

Show HN: GoHPTS-TCP/UDP Transparent Proxy with ARP Spoofing and Traffic Sniffing

https://github.com/shadowy-pycoder/go-http-proxy-to-socks
2•shadowy-pycoder•11h ago•0 comments

Show HN: Aidlab – Health Data for Devs

55•guzik•3d ago•17 comments

Show HN: Daily install trends of AI coding extensions in VS Code

https://bloomberry.com/coding-tools.html
23•AznHisoka•12h ago•9 comments

Show HN: Baby's first international landline

https://wip.tf/posts/telefonefix-building-babys-first-international-landline/
221•nbr23•6d ago•54 comments

Show HN: A Digital Twin of my coffee roaster that runs in the browser

https://autoroaster.com/
155•jvkoch•1w ago•37 comments

Show HN: docker/model-runner – an open-source tool for local LLMs

https://github.com/docker/model-runner
17•ericcurtin•13h ago•9 comments

Show HN: Wordle-Style Daily Wikipedia Game

https://hyperlinked.wiki
4•Mistri•13h ago•1 comments

Show HN: A Lisp Interpreter for Shell Scripting

https://github.com/gue-ni/redstart
113•quintussss•6d ago•25 comments

Show HN: I extracted BASIC listings for Tim Hartnell's 1986 book

https://github.com/nzduck/hartnell-exploring-ai-book
60•nzduck•4d ago•6 comments

Show HN: I invented a new generative model and got accepted to ICLR

https://discrete-distribution-networks.github.io/
649•diyer22•4d ago•90 comments

Show HN: Lights Out: my 2D Rubik's Cube-like Game

https://raymondtana.github.io/projects/pages/Lights_Out.html
80•raymondtana•4d ago•25 comments

Show HN: AI visuals that feel the music

https://www.trackart.io/
2•feskk•18h ago•0 comments

Show HN: Rift – A tiling window manager for macOS

https://github.com/acsandmann/rift
212•atticus_•3d ago•120 comments

Show HN: Open source, logical multi-master PostgreSQL replication

https://github.com/pgEdge/spock
150•pgedge_postgres•5d ago•60 comments

Show HN: FFTN, faster than FFTW in 700 lines of C

https://gitlab.sac-home.org/sac-group/fftn
7•thomaskoopman•1d ago•0 comments
Open in hackernews

Show HN: An open source access logs analytics script to block bot attacks

https://github.com/tempesta-tech/webshield
22•krizhanovsky•6h ago
This is a small PoC Python project for web server access logs analyzing to classify and dynamically block bad bots, such as L7 (application-level) DDoS bots, web scrappers and so on.

We'll be happy to gather initial feedback on usability and features, especialy from people having good or bad experience wit bots.

*Requirements*

The analyzer relies on 3 Tempesta FW specific features which you still can get with other HTTP servers or accelerators:

1. JA5 client fingerprinting (https://tempesta-tech.com/knowledge-base/Traffic-Filtering-b...). This is a HTTP and TLS layers fingerprinting, similar to JA4 (https://blog.foxio.io/ja4%2B-network-fingerprinting) and JA3 fingerprints. The last is also available in Envoy (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extension...) or Nginx module (https://github.com/fooinha/nginx-ssl-ja3), so check the documentation for your web server

2. Access logs are directly written to Clickhouse analytics database, which can cunsume large data batches and quickly run analytic queries. For other web proxies beside Tempesta FW, you typically need to build a custom pipeline to load access logs into Clickhouse. Such pipeliens aren't so rare though.

3. Abbility to block web clients by IP or JA5 hashes. IP blocking is probably available in any HTTP proxy.

*How does it work*

This is a daemon, which

1. Learns normal traffic profiles: means and standard deviations for client requests per second, error responses, bytes per second and so on. Also it remembers client IPs and fingerprints.

2. If it sees a spike in z-score (https://en.wikipedia.org/wiki/Standard_score) for traffic characteristics or can be triggered manually. Next, it goes in data model search mode

3. For example, the first model could be top 100 JA5 HTTP hashes, which produce the most error responses per second (typical for password crackers). Or it could be top 1000 IP addresses generating the most requests per second (L7 DDoS). Next, this model is going to be verified

4. The daemon repeats the query, but for some time, long enough history, in the past to see if in the past we saw a hige fraction of clients in both the query results. If yes, then the model is bad and we got to previous step to try another one. If not, then we (likely) has found the representative query.

5. Transfer the IP addresses or JA5 hashes from the query results into the web proxy blocking configuration and reload the proxy configuration (on-the-fly).

Comments

imiric•3h ago
Thanks for sharing!

The heuristics you use are interesting, but this will likely only be a hindrance to lazy bot creators. TLS fingerprints can be spoofed relatively easily, and most bots rotate their IPs and signals to avoid detection. With ML tools becoming more accessible, it's only a matter of time until bots are able to mimic human traffic well enough, both on the protocol and application level. They probably exist already, even if the cost is prohibitively high for most attackers, but that will go down.

Theoretically, deploying ML-based defenses is the only viable path forward, but even that will become infeasible. As the amount of internet traffic generated by bots surpasses the current ~50%, you can't realistically block half the internet.

So, ultimately, I think allow lists are the only option if we want to have a usable internet for humans. We need a secure and user-friendly way to identify trusted clients, which, unfortunately, is ripe to be exploited by companies and governments. All proposed device attestation and identity services I've seen make me uneasy. This needs to be a standard built into the internet, based on modern open cryptography, and not controlled by a single company or government.

I suppose it already exists with TLS client authentication, but that is highly impractical to deploy. Is there an ACME protocol for clients? ... Huh, Let's Encrypt did support issuing client certs, but they dropped it[1].

[1]: https://news.ycombinator.com/item?id=44018400

monster_truck•47m ago
This is not realistically useful