frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Fast(er) regular expression engines in Ruby

https://serpapi.com/blog/faster-regular-expression-engines-in-ruby/
60•davidsojevic•1y ago

Comments

yxhuvud•1y ago
Eww, pretending to support utf8 matchers while not supporting them at all was not pretty to see.
gitroom•1y ago
Honestly that part bugs me, fake support is worse than no support imo
kayodelycaon•1y ago
> Another nuance was found in ruby, which cannot scan the haystack with invalid UTF-8 byte sequences.

This is extremely basic ruby: UTF-8 encoded strings must be valid UTF-8. This is not unique to ruby. If I recall correctly, python 3 does the same thing.

    2.7.1 :001 > haystack = "\xfc\xa1\xa1\xa1\xa1\xa1abc"
    2.7.1 :003 > haystack.force_encoding "ASCII-8BIT"
    => "\xFC\xA1\xA1\xA1\xA1\xA1abc" 
    2.7.1 :004 > haystack.scan(/.+/)
    => ["\xFC\xA1\xA1\xA1\xA1\xA1abc"]
This person is a senior engineer on their Team page. All they had to do was google "ArgumentError: invalid byte sequence in UTF-8". Or ask a coworker... the company has Ruby on Rails applications. headdesk
burntsushi•1y ago
The nuance is specifically relevant here because neither of the other two regex engines benchmarked have this requirement. It's doubly relevant because that means running a regex search doesn't require a UTF-8 validation step, and is therefore likely beneficial from a perf perspective, dependening on the workload.
kayodelycaon•1y ago
That’s a good point. I hadn’t considered it because I’ve hit the validation error long before getting to search. It is possible to avoid string operations with careful coding prior to the search.

Edit: After a little testing, the strings can be read from and written to files without triggering validation. Presumably this applies to sockets as well.

DmitryOlshansky•1y ago
I wonder how std.regex of dlang would fare in such test. Sadly due to a tiny bit of D’s GC use it’s hard to provide as a library for other languages. If there is an interest I might take it through the tests.

GLM 5.2 beats Claude in our benchmarks

https://semgrep.dev/blog/2026/we-have-mythos-at-home-glm-52-beats-claude-in-our-cyber-benchmarks/
564•jms703•10h ago•263 comments

Age verification is just a precursor to automated attribution of speech

https://nonogra.ph/age-verification-is-just-a-precursor-to-attribution-of-speech-06-29-2026
7•arkhiver•14m ago•0 comments

Historical memory prices 1960-2026

https://dam.stanford.edu/memory-prices.html
219•vga1•9h ago•83 comments

Better Images of AI

https://betterimagesofai.org/
31•Curiositry•4h ago•18 comments

5k menus from the New York Public Library’s Buttolph Collection (1880-1920)

https://pudding.cool/2026/06/menu-story/
342•xbryanx•13h ago•87 comments

Knowledge Distillation of Black-Box Large Language Models (2024)

https://arxiv.org/abs/2401.07013
63•babelfish•5h ago•12 comments

I used Claude Code to get a second opinion on my MRI

https://antoine.fi/mri-analysis-using-claude-code-opus
370•engmarketer•11h ago•486 comments

Tell Congress: Don't Force Age Checks Online

https://act.eff.org/action/tell-congress-don-t-force-age-checks-online
65•rmason•2h ago•16 comments

Deciphering Basmala

https://blog.plover.com/lang/bismillah.html
21•lordgrenville•4d ago•5 comments

AI boom risks global financial crash, warn central bankers

https://www.telegraph.co.uk/business/2026/06/28/ai-boom-risks-global-financial-crash-central-bank...
84•b-man•2h ago•73 comments

TOP500 at ISC’26: We have a New Number 1 Supercomputer

https://chipsandcheese.com/p/top500-at-isc26-we-have-a-new-number
87•rbanffy•8h ago•45 comments

The Boeing 747 begins its final descent

https://www.theatlantic.com/magazine/2026/07/boeing-747-retirement/687304/
155•dbl000•3d ago•212 comments

Show HN: Zanagrams

https://zanagrams.com/
211•pompomsheep•12h ago•53 comments

Professor denounces mass AI fraud on an exam at Brown

https://english.elpais.com/education/2026-06-28/ai-fraud-at-brown-university-academic-integrity-i...
315•geox•11h ago•420 comments

Librepods: AirPods liberated

https://github.com/librepods-org/librepods
307•rbanffy•9h ago•99 comments

Working around dragons with the Lemote Yeeloong laptop and OpenBSD

http://oldvcr.blogspot.com/2026/06/working-around-dragons-with-lemote.html
95•zdw•10h ago•24 comments

Daisugi, the Japanese technique of growing trees out of other trees (2020)

https://www.openculture.com/2020/10/daisugi.html
120•MaysonL•11h ago•36 comments

The Baffling World of Masayoshi Son's Presentations (2020)

https://www.bloomberg.com/news/features/2020-06-23/golden-geese-and-unicorns-inside-the-eccentric...
27•phaser•2d ago•5 comments

Idler Magazine

https://www.idler.co.uk/
3•tomjakubowski•3d ago•0 comments

Show HN: DRM-Free Books

https://frequal.com/Perspectives/DrmFreeAuthors.html
81•TeaVMFan•10h ago•35 comments

Researchers have developed pixels that can emit and analyse light together

https://ethz.ch/en/news-and-events/eth-news/news/2026/06/a-new-type-of-pixel.html
51•tspng•1d ago•33 comments

Tokenmaxxing is dead, long live tokenmaxxing

https://12gramsofcarbon.com/p/agentics-tech-things-tokenmaxxing
119•theahura•11h ago•146 comments

The KIDS Act would require age checks to get online

https://www.eff.org/deeplinks/2026/06/kids-act-would-require-age-checks-get-online
357•bilsbie•16h ago•291 comments

A way to exclude sensitive files issue still open for OpenAI Codex

https://github.com/openai/codex/issues/2847
182•pikseladam•15h ago•121 comments

Examining circuit boards from the Space Shuttle's I/O Processor

https://www.righto.com/2026/06/space-shuttle-io-processor-boards.html
89•pwg•11h ago•20 comments

The curious case of the disappearing Polish S (2015)

https://aresluna.org/the-curious-case-of-the-disappearing-polish-s/
213•colinprince•15h ago•71 comments

Show HN: Bash4LLM+ – A lightweight, dependency-free Bash wrapper for LLM APIs

https://github.com/kamaludu/bash4llm/
38•kamaludu•8h ago•15 comments

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

https://github.com/JustVugg/nanoeuler
39•vforno•8h ago•9 comments

Model Training as Code

https://aleph-alpha.com/en/blog/model-training-as-code/
27•peterBlue75•3d ago•10 comments

The MUMPS 76 Primer – anniversary edition

https://github.com/rochus-keller/MUMPS/blob/main/docs/MUMPS_Primer.adoc
72•Rochus•15h ago•42 comments