-
-
Notifications
You must be signed in to change notification settings - Fork 350
Open
Description
Is your feature request related to a problem? Please describe.
I'm often frustrated when I need to manually search through large PDF documents to find specific keywords or phrases, especially in legal documents, academic papers, or reports. While PDF readers support basic search, there's no easy way to highlight all occurrences automatically and save the result for later reference or sharing.
Describe the solution you'd like
A Python script that takes a PDF file and one or more keywords as input, scans through the text of the PDF, and creates a new PDF with all keyword occurrences highlighted.
- The script should ideally:
- Support case-insensitive matching (optionally toggleable)
- Allow multiple keywords
- Save a copy of the PDF with highlights
- Be easy to run via the command line
Describe alternatives you've considered
- Manually using Ctrl+F in PDF viewers — not persistent and not ideal for sharing.
- Using Adobe Acrobat’s advanced search & highlight — requires paid software and manual setup.
- Writing custom code with other libraries like pdfminer, but it's more complex than PyMuPDF for highlighting.
Additional context
Libraries that can be used:
- PyMuPDF (fitz) – for reading and modifying PDFs and applying highlights
- argparse – for a CLI interface
Future enhancements can include: - Exporting keyword count/frequency
- GUI for selecting PDFs and entering keywords
Would love to contribute this script as a new addition to the repository. Please assign it to me.
Metadata
Metadata
Assignees
Labels
No labels