Searching through a bulk of pdf files

Darkassassin07@lemmy.ca · 5 days ago

Searching through a bulk of pdf files

Darkassassin07@lemmy.ca · 5 days ago

Interesting; that would be much simpler. I’ll give that a shot in the morning, thanks!

hoppolito@mander.xyz · 5 days ago

In case you are already using ripgrep (rg) instead of grep, there is also ripgrep-all (rga) which lets you search through a whole bunch of files like PDFs quickly. And it’s cached, so while the first indexing takes a moment any further search is lightning fast.

It supports a whole truckload of file types (pdf, odt, xlsx, tar.gz, mp4, and so on) but i mostly used it to quickly search through thousands of research papers. Takes around 5 minutes to index everything for my 4000 PDFs on the first run, then it’s smooth sailing for any further searches from there.