f3ccc09c3d
Build comprehensive malware analysis knowledge base from 3 sources: - SANS FOR610 course: 120 tools, 47 labs, 15 workflows, 27 recipes - REMnux salt-states: 340 packages parsed from GitHub - REMnux docs: 280+ tools scraped from docs.remnux.org Master inventory merges all sources into 447 tools with help tiers (rich/standard/basic). Pipeline generates: tools.db (397 entries), 397 cheatsheets with multi-tool recipes, 15 workflow guides, 224 TLDR pages, and coverage reports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
85 lines
3.1 KiB
Plaintext
85 lines
3.1 KiB
Plaintext
============================================================
|
|
Malicious Document Analysis
|
|
============================================================
|
|
|
|
Analyze suspicious documents (PDF, Office, RTF, OneNote) for embedded malware, macros, and exploits. Follows Zeltser's 6-step methodology.
|
|
|
|
Related FOR610 Labs: 3.1, 3.3, 3.4, 3.5
|
|
|
|
────────────────────────────────────────────────────────────
|
|
|
|
Step 1: Format Identification
|
|
Tools: file, trid
|
|
Identify true format: OLE2 (legacy Office), OOXML
|
|
(modern Office), RTF, PDF, OneNote. Don't trust the
|
|
file extension — use magic bytes.
|
|
|
|
$ file specimen.exe
|
|
$ trid document.doc
|
|
|
|
Step 2: Structure Analysis
|
|
Tools: oledump-py, rtfdump-py, pdfid-py, pdf-parser-py, onedump-py
|
|
Parse document internals. For Office: oledump.py to
|
|
list streams (M = macro). For PDF: pdfid.py for risky
|
|
keywords (/JavaScript, /OpenAction). For RTF:
|
|
rtfdump.py for hex-heavy groups.
|
|
|
|
$ oledump.py document.docm
|
|
$ rtfdump.py document.rtf
|
|
$ pdfid.py document.pdf
|
|
$ pdf-parser.py document.pdf -a
|
|
|
|
Step 3: Password Handling (if encrypted)
|
|
Tools: msoffcrypto-tool
|
|
If document is password-protected: msoffcrypto-tool -p
|
|
<password> <input> <output>. Common passwords:
|
|
infected, malware, password, 123456.
|
|
|
|
$ msoffcrypto-tool -p infected <encrypted.docx> <decrypted.docx>
|
|
|
|
Step 4: Macro/Script Extraction
|
|
Tools: oledump-py, olevba, pcode2code, XLMMacroDeobfuscator
|
|
Extract VBA: oledump.py -s <stream> -v. For p-code:
|
|
pcode2code. For Excel 4.0 macros:
|
|
XLMMacroDeobfuscator. Check olevba for auto-execute
|
|
triggers (AutoOpen, Document_Open).
|
|
|
|
$ oledump.py document.docm
|
|
$ olevba document.docm
|
|
$ pcode2code <document.docm>
|
|
$ xlmdeobfuscator --file <spreadsheet.xlsm>
|
|
|
|
Step 5: Payload Decoding
|
|
Tools: base64dump-py, translate-py, gunzip, numbers-to-string-py, cyberchef
|
|
Decode embedded payloads. Common chains: Base64 →
|
|
gunzip → XOR. Use CyberChef for visual multi-step
|
|
decoding. translate.py for byte-level transforms (byte
|
|
^ key).
|
|
|
|
$ base64dump.py file.txt
|
|
$ translate.py "byte ^ 35" < input.bin > output.bin
|
|
$ gunzip -c compressed.gz > output.bin
|
|
$ oledump.py doc.docm -s A3 -v | numbers-to-string.py -j
|
|
$ cyberchef
|
|
|
|
Step 6: Embedded Object Analysis
|
|
Tools: scdbgc, xorsearch, yara, 1768-py
|
|
If shellcode found: emulate with scdbgc. Scan for
|
|
known patterns (YARA). Check for Cobalt Strike beacons
|
|
(1768.py). Route PE payloads to Static Analysis
|
|
Workflow.
|
|
|
|
$ scdbgc /f shellcode.bin /s -1
|
|
$ XORSearch -W -d 3 file.bin
|
|
$ yara-rules specimen.bin
|
|
$ 1768.py shellcode.bin
|
|
|
|
Step 7: Document IOCs
|
|
Record: embedded URLs, downloaded payload hashes, C2
|
|
addresses, macro behavior (what APIs called), exploit
|
|
type (CVE if applicable).
|
|
|
|
────────────────────────────────────────────────────────────
|
|
Tip: 'fhelp cheat <tool>' for full examples
|
|
'Ctrl+G' for interactive cheatsheet browser
|