first version

This commit is contained in:
tobias
2025-11-16 22:30:36 +01:00
parent 3d3625c93a
commit bd42e2f7e3
3 changed files with 203 additions and 0 deletions

View File

@@ -0,0 +1,35 @@
# SANS Courseware Watermark Remover
Removes personalized SANS “Licensed To …” watermarks (names, emails, hashes, dates) by deleting the actual `BT…ET` text objects in the PDF—no white boxes, no selection artefacts.
## Quickstart
1. **(If encrypted) Decrypt first with `qpdf`:**
```bash
qpdf --password='<YOUR_PASSWORD>' --decrypt INPUT.pdf INPUT_unlocked.pdf
```
2. **Install dependencies (Python 3.9+):**
```bash
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```
3. **Run the sanitizer (auto-creates `<input>_clean.pdf`):**
4.
```bash
# Recommended latest script:
python enhanced_sanitize_pdf.py INPUT_unlocked.pdf
```
## Notes
* The tool targets common SANS watermark patterns:
* Invisible 36-pt text with rendering modes 1/3,
* Rotated diagonal overlay using 25-pt fonts,
* Footer lines at 10/18-pt.
* If your course PDFs use different fonts/sizes, adjust the regex patterns inside the script.