Skip to content

PDF Keyword Scanner & Highlighter Script #478

@AksaRose

Description

@AksaRose

Is your feature request related to a problem? Please describe.
I'm often frustrated when I need to manually search through large PDF documents to find specific keywords or phrases, especially in legal documents, academic papers, or reports. While PDF readers support basic search, there's no easy way to highlight all occurrences automatically and save the result for later reference or sharing.

Describe the solution you'd like
A Python script that takes a PDF file and one or more keywords as input, scans through the text of the PDF, and creates a new PDF with all keyword occurrences highlighted.

  • The script should ideally:
  • Support case-insensitive matching (optionally toggleable)
  • Allow multiple keywords
  • Save a copy of the PDF with highlights
  • Be easy to run via the command line

Describe alternatives you've considered

  • Manually using Ctrl+F in PDF viewers — not persistent and not ideal for sharing.
  • Using Adobe Acrobat’s advanced search & highlight — requires paid software and manual setup.
  • Writing custom code with other libraries like pdfminer, but it's more complex than PyMuPDF for highlighting.

Additional context
Libraries that can be used:

  • PyMuPDF (fitz) – for reading and modifying PDFs and applying highlights
  • argparse – for a CLI interface
    Future enhancements can include:
  • Exporting keyword count/frequency
  • GUI for selecting PDFs and entering keywords
    Would love to contribute this script as a new addition to the repository. Please assign it to me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions