# WARP.md This file provides guidance to WARP (warp.dev) when working with code in this repository. ## Project Overview This repository contains a Docker-based file analysis toolkit, primarily focused on PDF and malware analysis. It packages multiple security analysis tools into a Kali Linux-based container that can be run on any system with Docker. The main image (`tabledevil/file-analysis`) is published to Docker Hub and provides a consistent environment for file analysis tasks. ## Core Architecture - **Base Image**: Kali Linux rolling release - **Primary Use Case**: Analyzing potentially malicious files (PDFs, Office docs, executables) - **Execution Model**: Container runs with mounted host directory (`/data`) for file access - **User Security**: Runs as non-privileged `nonroot` user (UID 1001) for security isolation ## Development Commands ### Building the Container ```bash docker build -t tabledevil/file-analysis . ``` ### Running the Container ```bash # Standard usage - mounts current directory docker run -it --rm -v "$(pwd):/data" tabledevil/file-analysis # Run specific command without interactive shell docker run --rm -v "$(pwd):/data" tabledevil/file-analysis pdfid.py suspicious.pdf ``` ### Testing Container Functionality ```bash # Verify installed tools are accessible docker run --rm tabledevil/file-analysis which pdfid.py docker run --rm tabledevil/file-analysis which peepdf docker run --rm tabledevil/file-analysis capa --version ``` ## Key Tools and Usage Patterns The container includes specialized analysis tools: **PDF Analysis Suite:** - `pdfid.py` - Quick PDF structure overview - `pdf-parser.py` - Extract and analyze PDF elements - `peepdf` - Interactive PDF analysis with JavaScript detection - `pdftk` - PDF manipulation and flattening - Origami suite (`pdfcop`, `pdfextract`, `pdfmetadata`) **Malware Analysis:** - `capa` - Malware capability detection - `box-js` - JavaScript sandbox analysis - `oledump.py`, `rtfdump.py`, `emldump.py` - Office document analysis - `visidata` - Data exploration and analysis **File Format Tools:** - `exiftool` - Metadata extraction - `catdoc`, `docx2txt` - Document conversion - `unrtf` - RTF processing - ImageMagick - Image processing (PDF policy modified for read/write) ## Environment Configuration - **Timezone**: Europe/Berlin - **Python**: Uses `--break-system-packages` for pip installations due to Kali base - **PATH**: Extended to include `/opt/didierstevenssuite/` and pypy binaries - **Working Directory**: `/data` (expected mount point) ## Development Guidelines ### Docker Best Practices Applied - Multi-stage approach with dependency installation - Non-root user execution - Minimal layer count optimization - Proper cleanup of package caches ### Tool Integration - Didier Stevens suite tools are cloned from GitHub and made executable - Python tools installed via both system pip and pipx for isolation - Ruby gems (Origami) installed system-wide - npm packages installed globally for JavaScript analysis ### Security Considerations - Container runs as unprivileged user - ImageMagick PDF policy relaxed only for necessary operations - File analysis happens in isolated container environment ## File Structure - `Dockerfile` - Main container build configuration - `files/README` - German language tool documentation for container users - `files/command_help` - Detailed usage examples for PDF analysis tools - `pip.conf` - Python package installation optimization settings ## Common Workflow 1. Place suspicious files in a directory 2. Run container with that directory mounted to `/data` 3. Use appropriate analysis tools based on file type 4. Extract results and artifacts to the mounted directory 5. Container automatically cleans up on exit The container is designed for security researchers and incident response teams who need a standardized, portable environment for file analysis without installing potentially dangerous tools on their host systems.