Files
docker_file_analysis/WARP.md
Tobias Kessels 169ef5fb03 Migrate from Kali to REMnux base image
- Created new Dockerfile.remnux based on remnux/remnux-distro:latest
- Added comprehensive tool testing suite (test-tools.sh, test-containers.sh)
- Tool comparison analysis shows we get all original tools plus additional ones from REMnux:
  * Additional PDF tools: qpdf, pdfresurrect, pdftool, base64dump, tesseract
  * All original tools preserved: pdfid.py, pdf-parser.py, peepdf, origami, capa, box-js, visidata, unfurl
- Updated README.md with new usage instructions
- Updated WARP.md documentation
- All 21 tools tested and verified working
- Migration maintains full functionality while adding REMnux capabilities
2025-09-30 12:40:55 +02:00

3.9 KiB

WARP.md

This file provides guidance to WARP (warp.dev) when working with code in this repository.

Project Overview

This repository contains a Docker-based file analysis toolkit, primarily focused on PDF and malware analysis. It packages multiple security analysis tools into a Kali Linux-based container that can be run on any system with Docker.

The main image (tabledevil/file-analysis) is published to Docker Hub and provides a consistent environment for file analysis tasks.

Core Architecture

  • Base Image: Kali Linux rolling release
  • Primary Use Case: Analyzing potentially malicious files (PDFs, Office docs, executables)
  • Execution Model: Container runs with mounted host directory (/data) for file access
  • User Security: Runs as non-privileged nonroot user (UID 1001) for security isolation

Development Commands

Building the Container

docker build -t tabledevil/file-analysis .

Running the Container

# Standard usage - mounts current directory
docker run -it --rm -v "$(pwd):/data" tabledevil/file-analysis

# Run specific command without interactive shell
docker run --rm -v "$(pwd):/data" tabledevil/file-analysis pdfid.py suspicious.pdf

Testing Container Functionality

# Verify installed tools are accessible
docker run --rm tabledevil/file-analysis which pdfid.py
docker run --rm tabledevil/file-analysis which peepdf
docker run --rm tabledevil/file-analysis capa --version

Key Tools and Usage Patterns

The container includes specialized analysis tools:

PDF Analysis Suite:

  • pdfid.py - Quick PDF structure overview
  • pdf-parser.py - Extract and analyze PDF elements
  • peepdf - Interactive PDF analysis with JavaScript detection
  • pdftk - PDF manipulation and flattening
  • Origami suite (pdfcop, pdfextract, pdfmetadata)

Malware Analysis:

  • capa - Malware capability detection
  • box-js - JavaScript sandbox analysis
  • oledump.py, rtfdump.py, emldump.py - Office document analysis
  • visidata - Data exploration and analysis

File Format Tools:

  • exiftool - Metadata extraction
  • catdoc, docx2txt - Document conversion
  • unrtf - RTF processing
  • ImageMagick - Image processing (PDF policy modified for read/write)

Environment Configuration

  • Timezone: Europe/Berlin
  • Python: Uses --break-system-packages for pip installations due to Kali base
  • PATH: Extended to include /opt/didierstevenssuite/ and pypy binaries
  • Working Directory: /data (expected mount point)

Development Guidelines

Docker Best Practices Applied

  • Multi-stage approach with dependency installation
  • Non-root user execution
  • Minimal layer count optimization
  • Proper cleanup of package caches

Tool Integration

  • Didier Stevens suite tools are cloned from GitHub and made executable
  • Python tools installed via both system pip and pipx for isolation
  • Ruby gems (Origami) installed system-wide
  • npm packages installed globally for JavaScript analysis

Security Considerations

  • Container runs as unprivileged user
  • ImageMagick PDF policy relaxed only for necessary operations
  • File analysis happens in isolated container environment

File Structure

  • Dockerfile - Main container build configuration
  • files/README - German language tool documentation for container users
  • files/command_help - Detailed usage examples for PDF analysis tools
  • pip.conf - Python package installation optimization settings

Common Workflow

  1. Place suspicious files in a directory
  2. Run container with that directory mounted to /data
  3. Use appropriate analysis tools based on file type
  4. Extract results and artifacts to the mounted directory
  5. Container automatically cleans up on exit

The container is designed for security researchers and incident response teams who need a standardized, portable environment for file analysis without installing potentially dangerous tools on their host systems.