Files
tobias f3ccc09c3d Add FOR610 tool/workflow knowledge base and data pipeline
Build comprehensive malware analysis knowledge base from 3 sources:
- SANS FOR610 course: 120 tools, 47 labs, 15 workflows, 27 recipes
- REMnux salt-states: 340 packages parsed from GitHub
- REMnux docs: 280+ tools scraped from docs.remnux.org

Master inventory merges all sources into 447 tools with help tiers
(rich/standard/basic). Pipeline generates: tools.db (397 entries),
397 cheatsheets with multi-tool recipes, 15 workflow guides, 224
TLDR pages, and coverage reports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:38:15 +01:00

85 lines
3.1 KiB
Plaintext

============================================================
Malicious Document Analysis
============================================================
Analyze suspicious documents (PDF, Office, RTF, OneNote) for embedded malware, macros, and exploits. Follows Zeltser's 6-step methodology.
Related FOR610 Labs: 3.1, 3.3, 3.4, 3.5
────────────────────────────────────────────────────────────
Step 1: Format Identification
Tools: file, trid
Identify true format: OLE2 (legacy Office), OOXML
(modern Office), RTF, PDF, OneNote. Don't trust the
file extension — use magic bytes.
$ file specimen.exe
$ trid document.doc
Step 2: Structure Analysis
Tools: oledump-py, rtfdump-py, pdfid-py, pdf-parser-py, onedump-py
Parse document internals. For Office: oledump.py to
list streams (M = macro). For PDF: pdfid.py for risky
keywords (/JavaScript, /OpenAction). For RTF:
rtfdump.py for hex-heavy groups.
$ oledump.py document.docm
$ rtfdump.py document.rtf
$ pdfid.py document.pdf
$ pdf-parser.py document.pdf -a
Step 3: Password Handling (if encrypted)
Tools: msoffcrypto-tool
If document is password-protected: msoffcrypto-tool -p
<password> <input> <output>. Common passwords:
infected, malware, password, 123456.
$ msoffcrypto-tool -p infected <encrypted.docx> <decrypted.docx>
Step 4: Macro/Script Extraction
Tools: oledump-py, olevba, pcode2code, XLMMacroDeobfuscator
Extract VBA: oledump.py -s <stream> -v. For p-code:
pcode2code. For Excel 4.0 macros:
XLMMacroDeobfuscator. Check olevba for auto-execute
triggers (AutoOpen, Document_Open).
$ oledump.py document.docm
$ olevba document.docm
$ pcode2code <document.docm>
$ xlmdeobfuscator --file <spreadsheet.xlsm>
Step 5: Payload Decoding
Tools: base64dump-py, translate-py, gunzip, numbers-to-string-py, cyberchef
Decode embedded payloads. Common chains: Base64 →
gunzip → XOR. Use CyberChef for visual multi-step
decoding. translate.py for byte-level transforms (byte
^ key).
$ base64dump.py file.txt
$ translate.py "byte ^ 35" < input.bin > output.bin
$ gunzip -c compressed.gz > output.bin
$ oledump.py doc.docm -s A3 -v | numbers-to-string.py -j
$ cyberchef
Step 6: Embedded Object Analysis
Tools: scdbgc, xorsearch, yara, 1768-py
If shellcode found: emulate with scdbgc. Scan for
known patterns (YARA). Check for Cobalt Strike beacons
(1768.py). Route PE payloads to Static Analysis
Workflow.
$ scdbgc /f shellcode.bin /s -1
$ XORSearch -W -d 3 file.bin
$ yara-rules specimen.bin
$ 1768.py shellcode.bin
Step 7: Document IOCs
Record: embedded URLs, downloaded payload hashes, C2
addresses, macro behavior (what APIs called), exploit
type (CVE if applicable).
────────────────────────────────────────────────────────────
Tip: 'fhelp cheat <tool>' for full examples
'Ctrl+G' for interactive cheatsheet browser