frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Fast(er) regular expression engines in Ruby

https://serpapi.com/blog/faster-regular-expression-engines-in-ruby/
60•davidsojevic•1y ago

Comments

yxhuvud•1y ago
Eww, pretending to support utf8 matchers while not supporting them at all was not pretty to see.
gitroom•1y ago
Honestly that part bugs me, fake support is worse than no support imo
kayodelycaon•1y ago
> Another nuance was found in ruby, which cannot scan the haystack with invalid UTF-8 byte sequences.

This is extremely basic ruby: UTF-8 encoded strings must be valid UTF-8. This is not unique to ruby. If I recall correctly, python 3 does the same thing.

    2.7.1 :001 > haystack = "\xfc\xa1\xa1\xa1\xa1\xa1abc"
    2.7.1 :003 > haystack.force_encoding "ASCII-8BIT"
    => "\xFC\xA1\xA1\xA1\xA1\xA1abc" 
    2.7.1 :004 > haystack.scan(/.+/)
    => ["\xFC\xA1\xA1\xA1\xA1\xA1abc"]
This person is a senior engineer on their Team page. All they had to do was google "ArgumentError: invalid byte sequence in UTF-8". Or ask a coworker... the company has Ruby on Rails applications. headdesk
burntsushi•1y ago
The nuance is specifically relevant here because neither of the other two regex engines benchmarked have this requirement. It's doubly relevant because that means running a regex search doesn't require a UTF-8 validation step, and is therefore likely beneficial from a perf perspective, dependening on the workload.
kayodelycaon•1y ago
That’s a good point. I hadn’t considered it because I’ve hit the validation error long before getting to search. It is possible to avoid string operations with careful coding prior to the search.

Edit: After a little testing, the strings can be read from and written to files without triggering validation. Presumably this applies to sockets as well.

DmitryOlshansky•1y ago
I wonder how std.regex of dlang would fare in such test. Sadly due to a tiny bit of D’s GC use it’s hard to provide as a library for other languages. If there is an interest I might take it through the tests.

Google Cloud Fraud Defence is just WEI repackaged

https://privatecaptcha.com/blog/google-cloud-fraud-defence-wei/
292•ribtoks•3h ago•129 comments

Cartoon Network Flash Games

https://www.webdesignmuseum.org/flash-game-exhibitions/cartoon-network-flash-games
70•willmeyers•1h ago•27 comments

Serving a Website on a Raspberry Pi Zero Running in RAM

https://btxx.org/posts/memory/
99•xngbuilds•2h ago•40 comments

An Introduction to Meshtastic

https://meshtastic.org/docs/introduction/
245•ColinWright•6h ago•95 comments

PC Engine CPU

https://jsgroth.dev/blog/posts/pc-engine-cpu/
77•ibobev•3h ago•21 comments

A web page that shows you everything the browser told it without asking

https://sinceyouarrived.world/taken
246•mwheelz•5h ago•130 comments

Poland is now among the 20 largest economies

https://apnews.com/article/poland-economy-growth-g20-gdp-26fe06e120398410f8d773ba5661e7aa
678•surprisetalk•5h ago•589 comments

Apple, Intel have reached preliminary chip-making deal

https://www.reuters.com/business/apple-intel-have-reached-preliminary-chip-making-deal-wsj-report...
26•scrlk•27m ago•0 comments

Podman rootless containers and the Copy Fail exploit

https://garrido.io/notes/podman-rootless-containers-copy-fail/
69•ggpsv•4h ago•14 comments

Rumors of my death are slightly exaggerated

1009•CliffStoll•2d ago•129 comments

Cloudflare to cut about 20% of its workforce

https://www.reuters.com/business/world-at-work/cloudflare-cut-over-1100-jobs-2026-05-07/
1161•PriorityLeft•21h ago•810 comments

Show HN: Git for AI Agents

https://github.com/regent-vcs/re_gent
47•doshay•3h ago•28 comments

Canvas online again as ShinyHunters threatens to leak schools’ data

https://www.theverge.com/tech/926458/canvas-shinyhunters-breach
870•stefanpie•19h ago•572 comments

Mojo 1.0 Beta

https://mojolang.org/
127•sbt567•15h ago•119 comments

US Government releases first batch of UAP documents and videos

https://www.war.gov/UFO/
114•david-gpu•5h ago•189 comments

Maybe you shouldn't install new software for a bit

https://xeiaso.net/blog/2026/abstain-from-install/
753•psxuaw•18h ago•399 comments

GeoJSON

https://geojson.org/
117•tosh•7h ago•55 comments

Dirtyfrag: Universal Linux LPE

https://www.openwall.com/lists/oss-security/2026/05/07/8
751•flipped•22h ago•305 comments

The surprisingly complex journey to text-selectable client-side generated PDFs

https://sdocs.dev/blogs/journey-to-pdf-generation
54•FailMore•1d ago•46 comments

ClojureScript Gets Async/Await

https://clojurescript.org/news/2026-05-07-release
238•Borkdude•10h ago•56 comments

David Attenborough's 100th Birthday

https://www.bbc.com/news/articles/cp3pww9g0p5o
45•defrost•5h ago•1 comments

Ask HN: We just had an actual UUID v4 collision...

131•mittermayr•9h ago•131 comments

The map that keeps Burning Man honest

https://www.not-ship.com/burning-man-moop/
734•speckx•1d ago•337 comments

The Disappearance of the Public Bench

https://placesjournal.org/article/the-disappearance-of-the-public-bench/
92•cainxinth•1d ago•108 comments

Pinocchio is weirder than you remembered

https://storica.club/blog/pinocchio-in-italian/
262•cemsakarya•2d ago•107 comments

Dithering with CSS

https://ikesau.co/blog/dithering-with-css/
102•speckx•4d ago•29 comments

QBE – Compiler Back End

https://c9x.me/compile/
71•smartmic•10h ago•20 comments

Inventing Cyrillic (2024)

https://www.historytoday.com/archive/history-matters/inventing-cyrillic
32•lermontov•2d ago•67 comments

Agents need control flow, not more prompts

https://bsuh.bearblog.dev/agents-need-control-flow/
557•bsuh•1d ago•270 comments

Hackers breach JDownloader's website to serve malware-laced downloads

https://www.neowin.net/news/if-you-downloaded-this-popular-software-recently-you-might-have-insta...
96•bundie•5h ago•26 comments