Skip to content
Advertisement

How to program a text search and replace in PDF files

How would I be able to programmatically search and replace some text in a large number of PDF files? I would like to remove a URL that has been added to a set of files. I have been able to remove the link using javascript under Batch Processing in Adobe Pro, but the link text remains. I have seen recommendations to use text touchup, which works manually, but I don’t want to modify 1300 files manually.

Advertisement

Answer

Finding text in a PDF can be inherently hard because of the graphical nature of the document format — the letters you are searching for may not be contiguous in the file. That said, CAM::PDF has some search-replace capabilities and heuristics. Give changepagestring.pl a try and see if it works on your PDFs.

To install:

 $ cpan install CAM::PDF
 # start a new terminal if this is your first cpan module
 $ changepagestring.pl input.pdf oldtext newtext output.pdf
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement