I've been working with MLOps pipelines lately, and it always bothered me that torch.load() (and Pickle in general) is basically an RCE vulnerability we've all just accepted. We download gigabytes of opaque weights from Hugging Face and run them in production, often with full privileges.
I looked for existing tools, but many relied on simple regex (easy to bypass) or didn't verify if the file was tampered with in transit.
So I built Veritensor. It’s a CLI tool to gatekeep models before they hit your runtime.
How it works under the hood:
1. Pickle Emulation: Instead of grepping for os.system, it emulates the Pickle VM stack. This catches obfuscated payloads (like STACK_GLOBAL assembly) without actually executing the code.
2. Identity Check: It hashes your local file and queries the Hugging Face Hub API to ensure it matches the upstream version bit-for-bit (detects MITM or corruption).
3. License Headers: It parses metadata from Safetensors/GGUF to detect restrictive licenses (like CC-BY-NC or AGPL) so you don't accidentally ship them in a commercial product.
4. Signing: Integrates with Sigstore Cosign to sign the container if the scan passes.
It supports PyTorch, Keras (checks for Lambda layers), and GGUF. Written in Python, Apache 2.0.
I’d love to hear your feedback on the detection logic or edge cases I might have missed with the Pickle emulation.
arseniibr•1h ago
I've been working with MLOps pipelines lately, and it always bothered me that torch.load() (and Pickle in general) is basically an RCE vulnerability we've all just accepted. We download gigabytes of opaque weights from Hugging Face and run them in production, often with full privileges.
I looked for existing tools, but many relied on simple regex (easy to bypass) or didn't verify if the file was tampered with in transit.
So I built Veritensor. It’s a CLI tool to gatekeep models before they hit your runtime.
How it works under the hood: 1. Pickle Emulation: Instead of grepping for os.system, it emulates the Pickle VM stack. This catches obfuscated payloads (like STACK_GLOBAL assembly) without actually executing the code. 2. Identity Check: It hashes your local file and queries the Hugging Face Hub API to ensure it matches the upstream version bit-for-bit (detects MITM or corruption). 3. License Headers: It parses metadata from Safetensors/GGUF to detect restrictive licenses (like CC-BY-NC or AGPL) so you don't accidentally ship them in a commercial product. 4. Signing: Integrates with Sigstore Cosign to sign the container if the scan passes.
It supports PyTorch, Keras (checks for Lambda layers), and GGUF. Written in Python, Apache 2.0.
I’d love to hear your feedback on the detection logic or edge cases I might have missed with the Pickle emulation.
Repo: https://github.com/ArseniiBrazhnyk/Veritensor PyPI: pip install veritensor