Files
pdf_sanatizer/README.md
2025-11-16 22:30:36 +01:00

1.0 KiB
Raw Blame History

SANS Courseware Watermark Remover

Removes personalized SANS “Licensed To …” watermarks (names, emails, hashes, dates) by deleting the actual BT…ET text objects in the PDF—no white boxes, no selection artefacts.

Quickstart

  1. (If encrypted) Decrypt first with qpdf:

    qpdf --password='<YOUR_PASSWORD>' --decrypt INPUT.pdf INPUT_unlocked.pdf
    
  2. Install dependencies (Python 3.9+):

    python -m venv .venv
    source .venv/bin/activate            # Windows: .venv\Scripts\activate
    pip install -r requirements.txt
    
  3. Run the sanitizer (auto-creates <input>_clean.pdf):

  4. # Recommended latest script:
    python enhanced_sanitize_pdf.py INPUT_unlocked.pdf
    

Notes

  • The tool targets common SANS watermark patterns:
    • Invisible 36-pt text with rendering modes 1/3,
    • Rotated diagonal overlay using 25-pt fonts,
    • Footer lines at 10/18-pt.
  • If your course PDFs use different fonts/sizes, adjust the regex patterns inside the script.