# Malicious Document Analysis > Analyze suspicious documents (PDF, Office, RTF, OneNote) for embedded malware, macros, and exploits. Follows Zeltser's 6-step methodology. **FOR610 Labs:** 3.1, 3.3, 3.4, 3.5 ## Steps ### Step 1: Format Identification **Tools:** [[tools/file|file]], [[tools/trid|trid]] Identify true format: OLE2 (legacy Office), OOXML (modern Office), RTF, PDF, OneNote. Don't trust the file extension — use magic bytes. ```bash file specimen.exe trid document.doc ``` ### Step 2: Structure Analysis **Tools:** [[tools/oledump-py|oledump-py]], [[tools/rtfdump-py|rtfdump-py]], [[tools/pdfid-py|pdfid-py]], [[tools/pdf-parser-py|pdf-parser-py]], [[tools/onedump-py|onedump-py]] Parse document internals. For Office: oledump.py to list streams (M = macro). For PDF: pdfid.py for risky keywords (/JavaScript, /OpenAction). For RTF: rtfdump.py for hex-heavy groups. ```bash oledump.py document.docm rtfdump.py document.rtf pdfid.py document.pdf ``` ### Step 3: Password Handling (if encrypted) **Tools:** [[tools/msoffcrypto-tool|msoffcrypto-tool]] If document is password-protected: msoffcrypto-tool -p . Common passwords: infected, malware, password, 123456. ```bash msoffcrypto-tool -p infected ``` ### Step 4: Macro/Script Extraction **Tools:** [[tools/oledump-py|oledump-py]], [[tools/olevba|olevba]], [[tools/pcode2code|pcode2code]], [[tools/xlmmacrodeobfuscator|XLMMacroDeobfuscator]] Extract VBA: oledump.py -s -v. For p-code: pcode2code. For Excel 4.0 macros: XLMMacroDeobfuscator. Check olevba for auto-execute triggers (AutoOpen, Document_Open). ```bash oledump.py document.docm olevba document.docm pcode2code ``` ### Step 5: Payload Decoding **Tools:** [[tools/base64dump-py|base64dump-py]], [[tools/translate-py|translate-py]], [[tools/gunzip|gunzip]], [[tools/numbers-to-string-py|numbers-to-string-py]], [[tools/cyberchef|cyberchef]] Decode embedded payloads. Common chains: Base64 → gunzip → XOR. Use CyberChef for visual multi-step decoding. translate.py for byte-level transforms (byte ^ key). ```bash base64dump.py file.txt translate.py "byte ^ 35" < input.bin > output.bin gunzip -c compressed.gz > output.bin ``` ### Step 6: Embedded Object Analysis **Tools:** [[tools/scdbgc|scdbgc]], [[tools/xorsearch|xorsearch]], [[tools/yara|yara]], [[tools/1768-py|1768-py]] If shellcode found: emulate with scdbgc. Scan for known patterns (YARA). Check for Cobalt Strike beacons (1768.py). Route PE payloads to Static Analysis Workflow. ```bash scdbgc /f shellcode.bin /s -1 XORSearch -W -d 3 file.bin yara-rules specimen.bin ``` ### Step 7: Document IOCs Record: embedded URLs, downloaded payload hashes, C2 addresses, macro behavior (what APIs called), exploit type (CVE if applicable). #documents #office #pdf #rtf #macro #onenote #workflow