Files
docker_file_analysis/WARP.md
Tobias Kessels 169ef5fb03 Migrate from Kali to REMnux base image
- Created new Dockerfile.remnux based on remnux/remnux-distro:latest
- Added comprehensive tool testing suite (test-tools.sh, test-containers.sh)
- Tool comparison analysis shows we get all original tools plus additional ones from REMnux:
  * Additional PDF tools: qpdf, pdfresurrect, pdftool, base64dump, tesseract
  * All original tools preserved: pdfid.py, pdf-parser.py, peepdf, origami, capa, box-js, visidata, unfurl
- Updated README.md with new usage instructions
- Updated WARP.md documentation
- All 21 tools tested and verified working
- Migration maintains full functionality while adding REMnux capabilities
2025-09-30 12:40:55 +02:00

106 lines
3.9 KiB
Markdown

# WARP.md
This file provides guidance to WARP (warp.dev) when working with code in this repository.
## Project Overview
This repository contains a Docker-based file analysis toolkit, primarily focused on PDF and malware analysis. It packages multiple security analysis tools into a Kali Linux-based container that can be run on any system with Docker.
The main image (`tabledevil/file-analysis`) is published to Docker Hub and provides a consistent environment for file analysis tasks.
## Core Architecture
- **Base Image**: Kali Linux rolling release
- **Primary Use Case**: Analyzing potentially malicious files (PDFs, Office docs, executables)
- **Execution Model**: Container runs with mounted host directory (`/data`) for file access
- **User Security**: Runs as non-privileged `nonroot` user (UID 1001) for security isolation
## Development Commands
### Building the Container
```bash
docker build -t tabledevil/file-analysis .
```
### Running the Container
```bash
# Standard usage - mounts current directory
docker run -it --rm -v "$(pwd):/data" tabledevil/file-analysis
# Run specific command without interactive shell
docker run --rm -v "$(pwd):/data" tabledevil/file-analysis pdfid.py suspicious.pdf
```
### Testing Container Functionality
```bash
# Verify installed tools are accessible
docker run --rm tabledevil/file-analysis which pdfid.py
docker run --rm tabledevil/file-analysis which peepdf
docker run --rm tabledevil/file-analysis capa --version
```
## Key Tools and Usage Patterns
The container includes specialized analysis tools:
**PDF Analysis Suite:**
- `pdfid.py` - Quick PDF structure overview
- `pdf-parser.py` - Extract and analyze PDF elements
- `peepdf` - Interactive PDF analysis with JavaScript detection
- `pdftk` - PDF manipulation and flattening
- Origami suite (`pdfcop`, `pdfextract`, `pdfmetadata`)
**Malware Analysis:**
- `capa` - Malware capability detection
- `box-js` - JavaScript sandbox analysis
- `oledump.py`, `rtfdump.py`, `emldump.py` - Office document analysis
- `visidata` - Data exploration and analysis
**File Format Tools:**
- `exiftool` - Metadata extraction
- `catdoc`, `docx2txt` - Document conversion
- `unrtf` - RTF processing
- ImageMagick - Image processing (PDF policy modified for read/write)
## Environment Configuration
- **Timezone**: Europe/Berlin
- **Python**: Uses `--break-system-packages` for pip installations due to Kali base
- **PATH**: Extended to include `/opt/didierstevenssuite/` and pypy binaries
- **Working Directory**: `/data` (expected mount point)
## Development Guidelines
### Docker Best Practices Applied
- Multi-stage approach with dependency installation
- Non-root user execution
- Minimal layer count optimization
- Proper cleanup of package caches
### Tool Integration
- Didier Stevens suite tools are cloned from GitHub and made executable
- Python tools installed via both system pip and pipx for isolation
- Ruby gems (Origami) installed system-wide
- npm packages installed globally for JavaScript analysis
### Security Considerations
- Container runs as unprivileged user
- ImageMagick PDF policy relaxed only for necessary operations
- File analysis happens in isolated container environment
## File Structure
- `Dockerfile` - Main container build configuration
- `files/README` - German language tool documentation for container users
- `files/command_help` - Detailed usage examples for PDF analysis tools
- `pip.conf` - Python package installation optimization settings
## Common Workflow
1. Place suspicious files in a directory
2. Run container with that directory mounted to `/data`
3. Use appropriate analysis tools based on file type
4. Extract results and artifacts to the mounted directory
5. Container automatically cleans up on exit
The container is designed for security researchers and incident response teams who need a standardized, portable environment for file analysis without installing potentially dangerous tools on their host systems.