Files
pdf_sanatizer/README.md
2025-11-16 22:30:36 +01:00

36 lines
1.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SANS Courseware Watermark Remover
Removes personalized SANS “Licensed To …” watermarks (names, emails, hashes, dates) by deleting the actual `BT…ET` text objects in the PDF—no white boxes, no selection artefacts.
## Quickstart
1. **(If encrypted) Decrypt first with `qpdf`:**
```bash
qpdf --password='<YOUR_PASSWORD>' --decrypt INPUT.pdf INPUT_unlocked.pdf
```
2. **Install dependencies (Python 3.9+):**
```bash
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```
3. **Run the sanitizer (auto-creates `<input>_clean.pdf`):**
4.
```bash
# Recommended latest script:
python enhanced_sanitize_pdf.py INPUT_unlocked.pdf
```
## Notes
* The tool targets common SANS watermark patterns:
* Invisible 36-pt text with rendering modes 1/3,
* Rotated diagonal overlay using 25-pt fonts,
* Footer lines at 10/18-pt.
* If your course PDFs use different fonts/sizes, adjust the regex patterns inside the script.