When I use regex, I expect to be able to lookbehind, so I am routinely hit by RE2's limitations in places where it's used. Sometimes the software uses the entire matched string and you can't use non-capturing groups to work around it.
I understand go's reasons, ReDoS etc, but the "purism" of RE2 does fly in the face of practicality to an irksome degree. This is not uncommon for go.
For instances where you need something more sophisticated than what’s in the standard library, you reach for 3rd party modules. And there are regex libraries for Go which support backtracking et al.
There’s definitely some irksome defaults in Go, but the choose of regex engine in the regexp library isn’t one of them
> the "purism" of RE2 does fly in the face of practicality to an irksome degree
It’s not purism tho. There are very practical reasons to want an FA-based engine, and if you compromise that to get additional features then the engine is pointless, you could have just used a backtracking engine in the first place.
If you need that from Go, you can probably use that to create a fork of this: https://github.com/wasilibs/go-re2
I wonder in what situation someone would even be tempted to put a capture group into a lookbehind expression, except unintentionally by using () instead of (?:) for grouping. Maybe in an attempt to obtain capture groups from overlapping matches? But even in that case, lookaheads would be clearer, when available.
Searching for a simple explanation of how it works, I found this which also explains negative look behind and look ahead. TIL:
say "Cool" ~~ /<:Letter>* <:Block("Emoticons")>/; # 「Cool」
say "Cześć" ~~ m:ignoremark/ Czesc /; # 「Cześć」
say "WEIẞE" ~~ m:ignorecase/ weisse /; # 「WEIẞE」
say "หนูแฮมสเตอร์" ~~ /<:Letter>+/; # 「หนูแฮมสเตอร์」
My main focus for the `regex` crate has been on performance: https://github.com/BurntSushi/rebar
How does Raku's regex performance compare to Perl?
Making sure this line isn't glossed over: the point of the regex crate is that it provides linear-time guarantees for arbitrary regexes, making it safe (within reason) to expose the regex engine to untrusted input without running the risk of trivial DoS. From what I can tell, supporting lookbehinds in such a context is something that researchers have only recently described.
Or even trusted input! https://blog.cloudflare.com/details-of-the-cloudflare-outage...
It's good to have a focus and I agree that Rust is all about performance and stability for a system language.
I haven't seen Raku regex performance benchmarked, but I would be surprised if it beats perl or Rust.
I wouldn't say that Raku is a good choice where speed is the most important consideration since it is a scripting language that runs on a VM with GC. Nevertheless the language syntax includes many features (hyper operators, lazy evaluation to name two) that make it amenable to performance optimisation.
What 1: both regex and fancy-regex are crates. Regex is under the rust-lang umbrella but it’s not part of the stdlib.
What 2: having different options is the point of third partly libraries, why would you have a third party library which is the exact same thing as the standard library?
not having different options is the point of (batteries included) standard libraries ;-)
Over the last two weeks I wrote a dialog aware english sentence splitter using Claude code to write rust. The compile error when it stuck lookarounds in one of the regex’s was super useful to me.
my main point is that PCRE was based on perl regexes and that these were designed by Larry Wall and so he had some experience when it came to the strengths and weaknesses of of perl RE when it came to designing the Raku RE syntax (ie. the language formerly known as Perl 6)
Splitting the regex features between some core ones that meet a DoS standard and some non-core modules that do other "convenience" features makes sense as a trade off for Rust. It would not make sense in a scripting language like Raku where the weight is on coder expressiveness and making it easier / faster to write working code.
I seem to have hit a seam of intense implementation guys - and they are holding their own since they know their stuff.
I think there is room for improvement BOTH with new system language / core performance innovation AND with advancing the PCRE regex syntax (largely unchanged since the 1990s) and merging it seamlessly with standard language support for Grammars.
CJefferson•6h ago
If anyone knows (to let me be lazy), is this the same regex engine used by ripgrep? Or is that an independent implementation?
cbarrick•6h ago
flaghacker•6h ago
shilangyu•2h ago
burntsushi•2h ago