Add FOR610 tool/workflow knowledge base and data pipeline

Build comprehensive malware analysis knowledge base from 3 sources: - SANS FOR610 course: 120 tools, 47 labs, 15 workflows, 27 recipes - REMnux salt-states: 340 packages parsed from GitHub - REMnux docs: 280+ tools scraped from docs.remnux.org Master inventory merges all sources into 447 tools with help tiers (rich/standard/basic). Pipeline generates: tools.db (397 entries), 397 cheatsheets with multi-tool recipes, 15 workflow guides, 224 TLDR pages, and coverage reports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:38:15 +01:00
parent 06ebb09ab0
commit f3ccc09c3d
663 changed files with 36339 additions and 1 deletions
@@ -0,0 +1,84 @@
+============================================================
+  Malicious Document Analysis
+============================================================
+
+  Analyze suspicious documents (PDF, Office, RTF, OneNote) for embedded malware, macros, and exploits. Follows Zeltser's 6-step methodology.
+
+  Related FOR610 Labs: 3.1, 3.3, 3.4, 3.5
+
+────────────────────────────────────────────────────────────
+
+  Step 1: Format Identification
+  Tools: file, trid
+  Identify true format: OLE2 (legacy Office), OOXML
+  (modern Office), RTF, PDF, OneNote. Don't trust the
+  file extension — use magic bytes.
+
+    $ file specimen.exe
+    $ trid document.doc
+
+  Step 2: Structure Analysis
+  Tools: oledump-py, rtfdump-py, pdfid-py, pdf-parser-py, onedump-py
+  Parse document internals. For Office: oledump.py to
+  list streams (M = macro). For PDF: pdfid.py for risky
+  keywords (/JavaScript, /OpenAction). For RTF:
+  rtfdump.py for hex-heavy groups.
+
+    $ oledump.py document.docm
+    $ rtfdump.py document.rtf
+    $ pdfid.py document.pdf
+    $ pdf-parser.py document.pdf -a
+
+  Step 3: Password Handling (if encrypted)
+  Tools: msoffcrypto-tool
+  If document is password-protected: msoffcrypto-tool -p
+  <password> <input> <output>. Common passwords:
+  infected, malware, password, 123456.
+
+    $ msoffcrypto-tool -p infected <encrypted.docx> <decrypted.docx>
+
+  Step 4: Macro/Script Extraction
+  Tools: oledump-py, olevba, pcode2code, XLMMacroDeobfuscator
+  Extract VBA: oledump.py -s <stream> -v. For p-code:
+  pcode2code. For Excel 4.0 macros:
+  XLMMacroDeobfuscator. Check olevba for auto-execute
+  triggers (AutoOpen, Document_Open).
+
+    $ oledump.py document.docm
+    $ olevba document.docm
+    $ pcode2code <document.docm>
+    $ xlmdeobfuscator --file <spreadsheet.xlsm>
+
+  Step 5: Payload Decoding
+  Tools: base64dump-py, translate-py, gunzip, numbers-to-string-py, cyberchef
+  Decode embedded payloads. Common chains: Base64 →
+  gunzip → XOR. Use CyberChef for visual multi-step
+  decoding. translate.py for byte-level transforms (byte
+  ^ key).
+
+    $ base64dump.py file.txt
+    $ translate.py "byte ^ 35" < input.bin > output.bin
+    $ gunzip -c compressed.gz > output.bin
+    $ oledump.py doc.docm -s A3 -v | numbers-to-string.py -j
+    $ cyberchef
+
+  Step 6: Embedded Object Analysis
+  Tools: scdbgc, xorsearch, yara, 1768-py
+  If shellcode found: emulate with scdbgc. Scan for
+  known patterns (YARA). Check for Cobalt Strike beacons
+  (1768.py). Route PE payloads to Static Analysis
+  Workflow.
+
+    $ scdbgc /f shellcode.bin /s -1
+    $ XORSearch -W -d 3 file.bin
+    $ yara-rules specimen.bin
+    $ 1768.py shellcode.bin
+
+  Step 7: Document IOCs
+  Record: embedded URLs, downloaded payload hashes, C2
+  addresses, macro behavior (what APIs called), exploit
+  type (CVE if applicable).
+
+────────────────────────────────────────────────────────────
+  Tip: 'fhelp cheat <tool>' for full examples
+       'Ctrl+G' for interactive cheatsheet browser