Files
tabledevil eb211f38f4 Add README (CIRCL hashlookup usage + caveats)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:14:23 +02:00

79 lines
2.9 KiB
Markdown

# docker_nsrl
Offline **known-file hash filter** for DFIR triage — "is this file part of a
known software distribution, or is it unusual and worth a look?"
Backed by **[CIRCL hashlookup](https://www.circl.lu/services/hashlookup/)**:
at build time the image downloads CIRCL's `hashlookup-full.bloom` (a SHA-1
Bloom filter covering NIST NSRL **plus** many other known-good sources) and
queries it entirely offline. This replaces the original image, which shipped a
self-built **MD5** bloom frozen at NSRL RDS 2.72 (March 2021).
Published as `tabledevil/nsrl`. Built and refreshed by
[cert_docker_bot](https://git.ktf.ninja/tabledevil/cert_docker_bot) on a
monthly cadence.
> **Hashes are SHA-1.** The old image took MD5; the CIRCL dataset is SHA-1.
## Usage
### Look up individual hashes
Prints `+:` for known (in the set) and `-:` for unknown:
```bash
docker run --rm tabledevil/nsrl da39a3ee5e6b4b0d3255bfef95601890afd80709
# +:da39a3ee5e6b4b0d3255bfef95601890afd80709
```
### From stdin (pipe a hash list)
`-s` reads stdin; combine with `-0` (suppress known hits) to print only the
**unknown** hashes worth investigating, or `-1` to print only known ones:
```bash
sha1sum /evidence/* | awk '{print $1}' \
| docker run --rm -i tabledevil/nsrl -s -0
```
`-v` switches to verbose `hash:True|False` output and prints the bloom's
source/date header on stderr.
### Analyse a whole directory tree
Runs CIRCL's `hashlookup-forensic-analyser` over a mounted target, hashing
every file and emitting CSV (`hashlookup_result,filename,sha1,size`):
```bash
docker run --rm -v /evidence:/data:ro tabledevil/nsrl analyse -d /data
```
Pass any extra `hashlookup-analyser.py` flags after `analyse`.
## What's in the image
| path | purpose |
|------|---------|
| `/nsrl/hashlookup-full.bloom` | the SHA-1 Bloom filter (~1 GB), the data payload |
| `/nsrl/bloom.info` | source URL + upstream `Last-Modified` of the bloom |
| `/nsrl/search.py` | single-hash / stdin lookup (Flor bloom reader) |
| `/opt/hfa/` | hashlookup-forensic-analyser (directory mode) |
| `/entrypoint.sh` | dispatches `analyse …` vs hash lookup |
Image size is ~2.4 GB (the bloom dominates).
## Caveats
- **Bloom filters answer "probably yes" / "definitely no."** A `+` match has a
small false-positive probability by design; a `-` is authoritative. Treat a
hit as "known-good with high confidence," not proof.
- **Upstream freshness.** As of this writing CIRCL's `hashlookup-full.bloom`
has not changed since **Oct 2023** (the live API likewise reports
`nsrl-version: 2023.09.2`). The monthly rebuild re-fetches the same file
until CIRCL republishes — fine for "standard OS/app file?" triage, but it is
not a bleeding-edge dataset. If you need current data, query the online API
at `https://hashlookup.circl.lu/lookup/sha1/<hash>` instead.
## Building
```bash
docker build -t tabledevil/nsrl . # downloads the ~1 GB bloom at build time
```