Add README (CIRCL hashlookup usage + caveats)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,78 @@
|
|||||||
|
# docker_nsrl
|
||||||
|
|
||||||
|
Offline **known-file hash filter** for DFIR triage — "is this file part of a
|
||||||
|
known software distribution, or is it unusual and worth a look?"
|
||||||
|
|
||||||
|
Backed by **[CIRCL hashlookup](https://www.circl.lu/services/hashlookup/)**:
|
||||||
|
at build time the image downloads CIRCL's `hashlookup-full.bloom` (a SHA-1
|
||||||
|
Bloom filter covering NIST NSRL **plus** many other known-good sources) and
|
||||||
|
queries it entirely offline. This replaces the original image, which shipped a
|
||||||
|
self-built **MD5** bloom frozen at NSRL RDS 2.72 (March 2021).
|
||||||
|
|
||||||
|
Published as `tabledevil/nsrl`. Built and refreshed by
|
||||||
|
[cert_docker_bot](https://git.ktf.ninja/tabledevil/cert_docker_bot) on a
|
||||||
|
monthly cadence.
|
||||||
|
|
||||||
|
> **Hashes are SHA-1.** The old image took MD5; the CIRCL dataset is SHA-1.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Look up individual hashes
|
||||||
|
Prints `+:` for known (in the set) and `-:` for unknown:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --rm tabledevil/nsrl da39a3ee5e6b4b0d3255bfef95601890afd80709
|
||||||
|
# +:da39a3ee5e6b4b0d3255bfef95601890afd80709
|
||||||
|
```
|
||||||
|
|
||||||
|
### From stdin (pipe a hash list)
|
||||||
|
`-s` reads stdin; combine with `-0` (suppress known hits) to print only the
|
||||||
|
**unknown** hashes worth investigating, or `-1` to print only known ones:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sha1sum /evidence/* | awk '{print $1}' \
|
||||||
|
| docker run --rm -i tabledevil/nsrl -s -0
|
||||||
|
```
|
||||||
|
|
||||||
|
`-v` switches to verbose `hash:True|False` output and prints the bloom's
|
||||||
|
source/date header on stderr.
|
||||||
|
|
||||||
|
### Analyse a whole directory tree
|
||||||
|
Runs CIRCL's `hashlookup-forensic-analyser` over a mounted target, hashing
|
||||||
|
every file and emitting CSV (`hashlookup_result,filename,sha1,size`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --rm -v /evidence:/data:ro tabledevil/nsrl analyse -d /data
|
||||||
|
```
|
||||||
|
|
||||||
|
Pass any extra `hashlookup-analyser.py` flags after `analyse`.
|
||||||
|
|
||||||
|
## What's in the image
|
||||||
|
|
||||||
|
| path | purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `/nsrl/hashlookup-full.bloom` | the SHA-1 Bloom filter (~1 GB), the data payload |
|
||||||
|
| `/nsrl/bloom.info` | source URL + upstream `Last-Modified` of the bloom |
|
||||||
|
| `/nsrl/search.py` | single-hash / stdin lookup (Flor bloom reader) |
|
||||||
|
| `/opt/hfa/` | hashlookup-forensic-analyser (directory mode) |
|
||||||
|
| `/entrypoint.sh` | dispatches `analyse …` vs hash lookup |
|
||||||
|
|
||||||
|
Image size is ~2.4 GB (the bloom dominates).
|
||||||
|
|
||||||
|
## Caveats
|
||||||
|
|
||||||
|
- **Bloom filters answer "probably yes" / "definitely no."** A `+` match has a
|
||||||
|
small false-positive probability by design; a `-` is authoritative. Treat a
|
||||||
|
hit as "known-good with high confidence," not proof.
|
||||||
|
- **Upstream freshness.** As of this writing CIRCL's `hashlookup-full.bloom`
|
||||||
|
has not changed since **Oct 2023** (the live API likewise reports
|
||||||
|
`nsrl-version: 2023.09.2`). The monthly rebuild re-fetches the same file
|
||||||
|
until CIRCL republishes — fine for "standard OS/app file?" triage, but it is
|
||||||
|
not a bleeding-edge dataset. If you need current data, query the online API
|
||||||
|
at `https://hashlookup.circl.lu/lookup/sha1/<hash>` instead.
|
||||||
|
|
||||||
|
## Building
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker build -t tabledevil/nsrl . # downloads the ~1 GB bloom at build time
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user