Aa for ugrep, flipping the question around would be more appropriate. ugrep has caught up with ripgrep in some common cases, but not all.
First of all, the ugrep performance comparisons are online (and haven't been updated to compare against this version that was only released 3 days ago). So your question is answerable:
https://github.com/Genivia/ugrep-benchmarks
The two are very close and both are head and shoulders faster than most other options.
And backwards compatibility is a mixed thing, not a mandatory goal. It's admirable that ugrep is trying to be a better drop-in replacement. It's also cool that ripgrep is trying to rethink the interface for improving usability.
(I like ripgrep in part because it has different defaults than grep that work very well for my use cases, which is primarily searching through codebases. The lack of backwards compatibility goes both ways. Will we see a posix ripgrep? Probably not. Is ripgrep a super useful and user-friendly tool? Definitely.)
Does rg work in the places grep does or is it about the type of task being done? In my examples I expect more default recursion from rg than from regular grep and I'm searching an unknown codebase with it, where as I often know my way around more or less when using regular grep.
What the GP is suggesting is that their most common use case for grep is recursive search. That's what ripgrep does by default. With `grep`, you need the non-POSIX `-r` flag.
The other bit that the GP didn't mention but is critical to ripgrep's default behavior is that ripgrep will ignore files by default. Specifically, it respects gitignore files, ignores hidden files and ignores binary files. IMO, this is what most people mean by "ripgrep does the right thing by default." Because ripgrep will ignore most of the stuff you probably don't care about by default. Of course, you can disable this filtering easily: `rg -uuu`. This is also why ripgrep has never been intended to be POSIX compatible, despite people whinging about "backwards compatibility." That's a goal they are ascribing to the project that I have never professed. Indeed, I've been clear since the beginning that if you want a POSIX compatible grep, then you should just use a POSIX compatible grep. The existence of ripgrep does not prevent that.
Indeed, before I wrote ripgrep, I had a bunch of shell scripts in my ~/bin that wrapped grep for various use cases. I had one shell script for Python projects. Another for Go projects. And so on. These wrappers specifically excluded certain directories, because otherwise `grep -r` would search them. For big git repositories, this would in particular cause it to waste not only a bunch of time searching `.git`, but it would also often return irrelevant results from inside that directory.
Once I wrote ripgrep (I had never been turned on to `ack` or `ag`), all of those shell scripts disappeared. I didn't need them any more.
My understanding is that many other users have this same experience. I personally found it very freeing to get rid of all my little shell wrappers and just use the same tool everywhere. (`git grep` doesn't work nearly as well outside of git repositories for example. And it has, last I checked, some very steep performance cliffs.)
Some users don't like the default filtering. Or it surprises them so much that they are horrified by it. They can use `rg -uuu` or use one of the many other POSIX greps out there.
For example, in my checkout of the Chromium repository, notice how much faster ripgrep is at this specific use case (with the right flags given to `ugrep` to make it ignore the same files):
$ hyperfine --output pipe 'rg Openbox' 'ugrep-7.5.0 -rI --ignore-files Openbox ./'
Benchmark 1: rg Openbox
Time (mean ± σ): 281.0 ms ± 3.6 ms [User: 1294.8 ms, System: 1977.6 ms]
Range (min … max): 275.9 ms … 286.8 ms 10 runs
Benchmark 2: ugrep-7.5.0 -rI --ignore-files Openbox ./
Time (mean ± σ): 4.250 s ± 0.008 s [User: 4.683 s, System: 2.154 s]
Range (min … max): 4.242 s … 4.267 s 10 runs
Summary
rg Openbox ran
15.12 ± 0.19 times faster than ugrep-7.5.0 -rI --ignore-files Openbox ./
`ugrep` actually does a lot better if you don't ask it to respect gitignore files: $ hyperfine --output pipe 'rg -u Openbox' 'ugrep-7.5.0 -rI Openbox ./'
Benchmark 1: rg -u Openbox
Time (mean ± σ): 233.9 ms ± 3.3 ms [User: 650.4 ms, System: 2081.6 ms]
Range (min … max): 228.8 ms … 239.8 ms 12 runs
Benchmark 2: ugrep-7.5.0 -rI Openbox ./
Time (mean ± σ): 605.4 ms ± 6.4 ms [User: 1104.1 ms, System: 2710.8 ms]
Range (min … max): 596.1 ms … 613.9 ms 10 runs
Summary
rg -u Openbox ran
2.59 ± 0.05 times faster than ugrep-7.5.0 -rI Openbox ./
Even ripgrep runs a little faster. Because sometimes matching gitignores takes extra time. More so, it seems, in ugrep's case.Now ugrep is perhaps intended to be more like a POSIX grep than ripgrep is. So you could question whether this is a fair comparison. But if you're going to bring up "ripgrep catching up to ugrep," then it's fair game, IMO, to compare ripgrep's default mode of operation with ugrep using the necessary flags to match that mode.
Repository info:
$ git remote -v
origin git@github.com:nwjs/chromium.src (fetch)
origin git@github.com:nwjs/chromium.src (push)
$ git rev-parse HEAD
1e57811fe4583ac92d2f277837718486fbb98252
see this from ~3 months ago https://news.ycombinator.com/item?id=44358216
rg <string>
fd <string>
Probably the singular reason why I finally use regex as the first search option, rather than turning to it after bruting thru a search with standard wildcards.
Anyway I'm trying to retrain the fingers these days, rg is super cool.
I didn't bother switching to `ag` when it came around because of having to retrain.
But eventually I did switch to `rg` because it just has so many conveniences.
I even switched to `fd` recently instead of `find` because it's easier and less typing for common use-cases.
I've been using the terminal since 1997, so I'm happy I can still learn new things and use improved commands.
I had a similar thing with bash vs zsh before I learned about oh-my-zsh. Nushell also seems attractive these days... the good stuff from PowerShell in a POSIX-like shell.
As for perf, it's not hard to witness a 10x improvement that you'll actually feel. On my checkout of the Linux kernel:
$ (time rg -wi '\w+(PM_RESUME|LINK_REQ)') | wc -l
real 0.114
user 0.547
sys 0.543
maxmem 29 MB
faults 0
444
$ (time ag -wi '\w+(PM_RESUME|LINK_REQ)') | wc -l
real 0.949
user 6.618
sys 0.805
maxmem 65 MB
faults 0
444
Or even basic queries can have a pretty big difference. In my checkout of the Chromium repository: $ (time rg Openbox) | wc -l
real 0.296
user 1.349
sys 1.950
maxmem 71 MB
faults 0
11
$ (time ag Openbox) | wc -l
real 1.528
user 1.849
sys 8.285
maxmem 29 MB
faults 0
11
Or even more basic. You might search a file that is "too big" for ag: $ time ag '^\w{42}$' full.txt
ERR: Skipping full.txt: pcre_exec() can't handle files larger than 2147483647 bytes.
I also find that combining `-o/--only-matching` and `-r/--replace` has replaced many of my uses of `sed` and `awk`.
rgg() {
readarray -d '' -t FILES < <(git ls-files -z)
rg "${@}" "${FILES[@]}"
}
It speeds up a lot on directories with many binary files and committed dot files. To search the dot files, -uu is needed, but that also tells ripgrep to search the binary files.On repositories with hundreds of files, the git ls-files overhead a bit large.
Searching in hidden files tracked by git would be great but the overhead of querying git to list all tracked files is probably significant even in Rust.
$ printf '!.woodpecker\n!.forgejo\n!.gitlab-ci-yml\n' > .rgignore
Or whatever you need to whitelist specific hidden directories/files.For example, ripgrep has `!/.github/` in its `.ignore` file at the root of the repository[1].
By adding the `!`, these files get whitelisted even though they are hidden. Then `rg` with no extra arguments will search them automatically while still ignoring other hidden files/directories.
[1]: https://github.com/BurntSushi/ripgrep/blob/38d630261aded3a8e...
Also, `-uu` tells ripgrep to not respect gitignore and to search hidden files. But ripgrep will still skip binary files. You need `-uuu` to also ignore binary files.
I tried playing with your `rgg` function. First problem occurred when I tried it on a checkout the Linux kernel:
$ rgg APM_RESUME
bash: /home/andrew/rust/ripgrep/target/release/rg: Argument list too long
OK, so let's just use `xargs`: $ git ls-files -z | time xargs -0 rg APM_RESUME
arch/x86/kernel/apm_32.c
473: { APM_RESUME_DISABLED, "Resume timer disabled" },
include/uapi/linux/apm_bios.h
89:#define APM_RESUME_DISABLED 0x0d
real 0.638
user 0.741
sys 1.441
maxmem 29 MB
faults 0
And compared to just `rg APM_RESUME`: $ time rg APM_RESUME
arch/x86/kernel/apm_32.c
473: { APM_RESUME_DISABLED, "Resume timer disabled" },
include/uapi/linux/apm_bios.h
89:#define APM_RESUME_DISABLED 0x0d
real 0.097
user 0.399
sys 0.588
maxmem 29 MB
faults 0
So do you have an example where `git ls-files -z | xargs -0 rg ...` is faster than just `rg ...`?The repository contains CI files in .woodpecker. These are scripts that I'd normally expect to be searching in. Until a week ago I used -uu to do so, but that made rg take over 4 seconds for a search. Using -. brings the search time down to 24ms.
git ls-files -z | time xargs -0 rg -w e23
40ms
rg -w. e23
24ms
rgg -w e23
16ms
rg -wuu e23
2754ms
To reproduce this with the given repository, fill it with 20GB of binary files.The -. flag makes this point moot though.
Yes, now it makes sense. And yes, `-./--hidden` makes it moot. Thanks for following up!
All are less than 100ms, so fast enough.
My only complaint is there are a couple of characters that the -F (treat as literal) option seems to still treat as a special character needing some kind of escape - though I don't remember which ones now.
Always glad to see it keep updating!
If you have an example, I can try to explain for that specific case. But `-F/--fixed-strings` will 100% turn off any regex features in the pattern and instead will be treated as a simple literal. Where you might still need escaping is if your shell requires it.
Totally love your work! We've been sponsoring for awhile even though it isn't much. Thank you for all you do!
I'd be happy to answer more specific questions.
Short opinionated summary is: nicer API, fewer footguns, more features, better support for calendar durations, integrated tzdb support and lots more to be honest.
Note that `std::time` just gives you a platform independent but bare bones access to monotonic and system clocks. Some kind of datetime library is needed if you want to do anything with Unix timestamps beyond treat them as an integer.
It’s also smithing that’s unleashed the ability of agents to explore and reason about code faster than waiting for some sort of “lsp-like” standard we probably would’ve had to build instead over time.
I would prefer a solution that works from outside git repos, so no piping `git ls-files` into rg.
That is, you can whitelist specific hidden files/directories.
There is no way to tell ripgrep to "search precisely the set of tracked files in git." ripgrep doesn't read git repository state. It just looks at gitignore files and automatically ignores all hidden and binary files. So it make it work more like git, you might consider whitelisting the hidden files you want to search. To make it work exactly like git, you need to do the `git ls-files -z | xargs -0 rg ...` dance.
Maybe talk about your use case at a higher level.
And perf depends on your haystack size. If you have lots of data to search, it's not hard to witness a 10x difference: https://news.ycombinator.com/item?id=45629904
As for features that ripgrep has that ag doesn't:
* Much better Unicode support. (ag's is virtually non-existent.)
* Pluggable preprocessors with --pre.
* Jujutsu support.
* ripgrep can automatically search UTF-16 data.
* ripgrep has PCRE2 support. ag only has PCRE1 (which was EOL'd years ago).
* ripgrep has a `-r/--replace` flag that lets you manipulate the output. I use it a lot instead of `sed` or `awk` (for basic cases) these days.
* ripgrep is maintained.
* ripgrep has multiline search that seemingly works much better.
* ripgrep can search files bigger than 2GB. ag seemingly can't.
* ag has lots of whacky bugs.
e.g.,
$ ag -c '\w{8,} Sherlock Holmes' sixteenth.txt
9
$ rg -c '\w{8,} Sherlock Holmes' sixteenth.txt
9
$ cat sixteenth.txt | rg -c '\w{8,} Sherlock Holmes'
9
$ cat sixteenth.txt | ag -c '\w{8,} Sherlock Holmes'
1
1
1
1
1
1
1
1
1
Or: $ printf 'foo\nbar\n' | ag 'foo\s+bar'
$ printf 'foo\nbar\n' | rg -U 'foo\s+bar'
foo
bar
Or: $ ag '\w+ Sherlock Holmes' full.txt
ERR: Skipping full.txt: pcre_exec() can't handle files larger than 2147483647 bytes.
There's probably more. But that's what comes to mind.
IlikeMadison•7h ago