frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Fast(er) regular expression engines in Ruby

https://serpapi.com/blog/faster-regular-expression-engines-in-ruby/
60•davidsojevic•1y ago

Comments

yxhuvud•1y ago
Eww, pretending to support utf8 matchers while not supporting them at all was not pretty to see.
gitroom•1y ago
Honestly that part bugs me, fake support is worse than no support imo
kayodelycaon•1y ago
> Another nuance was found in ruby, which cannot scan the haystack with invalid UTF-8 byte sequences.

This is extremely basic ruby: UTF-8 encoded strings must be valid UTF-8. This is not unique to ruby. If I recall correctly, python 3 does the same thing.

    2.7.1 :001 > haystack = "\xfc\xa1\xa1\xa1\xa1\xa1abc"
    2.7.1 :003 > haystack.force_encoding "ASCII-8BIT"
    => "\xFC\xA1\xA1\xA1\xA1\xA1abc" 
    2.7.1 :004 > haystack.scan(/.+/)
    => ["\xFC\xA1\xA1\xA1\xA1\xA1abc"]
This person is a senior engineer on their Team page. All they had to do was google "ArgumentError: invalid byte sequence in UTF-8". Or ask a coworker... the company has Ruby on Rails applications. headdesk
burntsushi•1y ago
The nuance is specifically relevant here because neither of the other two regex engines benchmarked have this requirement. It's doubly relevant because that means running a regex search doesn't require a UTF-8 validation step, and is therefore likely beneficial from a perf perspective, dependening on the workload.
kayodelycaon•1y ago
That’s a good point. I hadn’t considered it because I’ve hit the validation error long before getting to search. It is possible to avoid string operations with careful coding prior to the search.

Edit: After a little testing, the strings can be read from and written to files without triggering validation. Presumably this applies to sockets as well.

DmitryOlshansky•1y ago
I wonder how std.regex of dlang would fare in such test. Sadly due to a tiny bit of D’s GC use it’s hard to provide as a library for other languages. If there is an interest I might take it through the tests.

Show HN: TikZ Editor – WYSIWYG editor for figures in LaTeX

https://tikz.dev/editor/
180•DominikPeters•2h ago•27 comments

Unlimited OCR: One-Shot Long-Horizon Parsing

https://github.com/baidu/Unlimited-OCR
306•ingve•5h ago•79 comments

Spying on kids to save kids from spying is stupid

https://pluralistic.net/2026/06/23/destroy-the-village/
330•hn_acker•2h ago•211 comments

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild

https://lift4d.github.io/
44•ilreb•2h ago•4 comments

Five monitors on a Commodore 128 [video]

https://www.youtube.com/watch?v=ul5hC3PY1Yg
22•EvanAnderson•22h ago•5 comments

Mistral OCR 4

https://mistral.ai/news/ocr-4/
216•meetpateltech•2h ago•61 comments

Show HN: Bun-sqlgen – Type-safe raw SQL for Bun, no ORM

https://github.com/ilbertt/bun-sqlgen
34•ilbert•2h ago•17 comments

Samsung Demonstrates 3D Stacked FETs with Triple Nanosheet Channels at 42nm

https://semiconductor.samsung.com/news-events/tech-blog/from-gaa-to-3d-stacked-fet-expanding-the-...
15•its_ajseven•4d ago•4 comments

Plotnine

https://plotnine.org/
190•tosh•4d ago•63 comments

MSG Made Dossier on Activists Who Opposed Facial Recognition

https://www.404media.co/madison-square-garden-made-dossier-on-activists-who-opposed-facial-recogn...
161•cdrnsf•3h ago•44 comments

Open Source for IBM Z and LinuxONE

https://community.ibm.com/community/user/blogs/elizabeth-k-joseph1/2026/06/18/linuxone-open-sourc...
24•ncruces•3d ago•3 comments

Digital euro clears key hurdle as EU seeks to break free from U.S. credit cards

https://finance.yahoo.com/markets/currencies/articles/ecb-secures-key-parliamentary-backing-10271...
18•madars•27m ago•1 comments

GLM-5.2 – How to Run Locally

https://unsloth.ai/docs/models/glm-5.2
530•TechTechTech•19h ago•254 comments

Will It Mythos?

https://swelljoe.com/post/will-it-mythos/
254•mindingnever•12h ago•190 comments

The Low-Tech AI of Elden Ring

https://nega.tv/posts/low-tech-ai-of-elden-ring.html
33•g0xA52A2A•5h ago•9 comments

Lossless GIF recompression via exhaustive search

https://blog.arusekk.pl/posts/lossless-gif-recompression/
33•ZacnyLos•3h ago•5 comments

80386 Early Start Memory Access

https://nand2mario.github.io/posts/2026/80386_early_start/
21•nand2mario•4h ago•1 comments

AI's Affordability Crisis

https://blog.dshr.org/2026/06/ais-affordability-crisis.html
125•ilreb•1h ago•145 comments

Crypto in 2026: Oh, This Is the Bad Place

https://www.stephendiehl.com/posts/bad_place_2026/
311•ibobev•6h ago•370 comments

Researchers used math to crack Wordle

https://www.binghamton.edu/news/story/6327/s-m-a-r-t-these-researchers-used-math-to-crack-wordle
25•hhs•2d ago•28 comments

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

https://arxiv.org/abs/2606.16140
315•timhigins•14h ago•169 comments

Show HN: Treedocs: Documentation that automatically checks for staleness

https://dandylyons.github.io/treedocs/
14•DandyLyons•1h ago•8 comments

Steam Machine launches today

https://store.steampowered.com/news/group/45479024/view/685257114654870245
1825•theschwa•23h ago•1571 comments

Show HN: Neural Particle Automata

https://selforg-npa.github.io/
67•esychology•8h ago•14 comments

In praise of memcached

https://jchri.st/blog/in-praise-of-memcached/
240•j03b•15h ago•100 comments

Giant Banana Pulled Over: Driver Says Cops Have Stopped Him 100s of Times

https://cowboystatedaily.com/2026/06/18/giant-banana-pulled-over-in-montana-driver-says-cops-have...
180•speckx•2d ago•70 comments

The Traditional Vi

https://ex-vi.sourceforge.net/
51•exvi•7h ago•36 comments

Show HN: Shumai – open-source Frame.io alternative for creative work

https://github.com/shumaiOne/shumai
40•Yiling-J•6h ago•4 comments

OpenAI DayBreak – GPT-5.5-Cyber

https://openai.com/index/daybreak-securing-the-world/
189•AaronO•15h ago•143 comments

8086 Segmented Memory was a good idea

https://owl.billpg.com/8086-segmented-memory-was-a-good-idea-almost/
56•billpg•2d ago•114 comments