Add FOR610 tool/workflow knowledge base and data pipeline

Build comprehensive malware analysis knowledge base from 3 sources: - SANS FOR610 course: 120 tools, 47 labs, 15 workflows, 27 recipes - REMnux salt-states: 340 packages parsed from GitHub - REMnux docs: 280+ tools scraped from docs.remnux.org Master inventory merges all sources into 447 tools with help tiers (rich/standard/basic). Pipeline generates: tools.db (397 entries), 397 cheatsheets with multi-tool recipes, 15 workflow guides, 224 TLDR pages, and coverage reports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:38:15 +01:00
parent 06ebb09ab0
commit f3ccc09c3d
663 changed files with 36339 additions and 1 deletions
@@ -0,0 +1,62 @@
+# FOR610 Knowledge Base
+
+Structured data extracted from the SANS FOR610 (Reverse-Engineering Malware) course materials.
+
+## Files
+
+| File | Description |
+|------|-------------|
+| `categories.yaml` | Tool category taxonomy (18 categories) |
+| `tools.yaml` | Master tool catalog (~110 tools with metadata) |
+| `labs.yaml` | All 47 labs with ordered tool sequences |
+| `workflows.yaml` | 8 high-level analysis workflow patterns |
+
+## Schema
+
+### tools.yaml
+
+Each tool entry contains:
+
+- `id` — unique kebab-case identifier (used for cross-references)
+- `name` — display name as typed on CLI
+- `aliases` — alternative names
+- `description` — one-line description
+- `category` — FK to categories.yaml
+- `platform` — `linux` | `windows` | `both` | `online`
+- `in_remnux` — boolean, available in REMnux container
+- `labs` — list of lab IDs that use this tool
+- `typical_usage` — 1-3 command examples
+- `for610_sections` — which course sections cover this tool
+- `tags` — free-form search tags
+
+### labs.yaml
+
+Each lab entry contains:
+
+- `id` — lab number (e.g., "3.1")
+- `section` — course section (1-5)
+- `title` — full lab title
+- `sample` — malware specimen analyzed
+- `analysis_type` — controlled vocabulary
+- `tools_used` — **ordered** list with `tool_id`, `platform`, and `purpose`
+- `key_techniques` — techniques demonstrated
+- `prerequisite_labs` — dependencies (optional)
+- `tags` — free-form search tags
+
+### workflows.yaml
+
+Each workflow contains ordered steps with tool references and related labs.
+
+## Generating JSON
+
+```bash
+make generate-data
+```
+
+This converts all YAML files to JSON under `data/generated/` using `yq`.
+
+## Cross-Reference Integrity
+
+Tool IDs in `labs.yaml` → `tools_used[].tool_id` must exist in `tools.yaml`.
+Lab IDs in `tools.yaml` → `labs[]` must exist in `labs.yaml`.
+Category IDs in `tools.yaml` → `category` must exist in `categories.yaml`.